uniq: Remove and Count Duplicate Lines

Overview

`uniq` is often used in conjunction with the `sort` command via a pipe (|) to efficiently process duplicate lines in sorted data. The `-c` option allows for easy identification of the frequency of duplicate lines.

Key Features

Processes consecutive duplicate lines
Counts duplicate lines (-c)
Case-insensitive comparison option (-i)
Ignores specific fields or characters for comparison

Key Options

Functionality

Comparison Method

Generated command:

Try combining the commands.

Description:

`uniq` Executes the command.

Combine the above options to virtually execute commands with AI.

Usage Examples

Calculate Word Frequency in a File

sort words.txt | uniq -c

Counts the occurrences of each word (line) in the `words.txt` file. Since `uniq` only processes consecutive duplicates, `sort` is used first to make all duplicates adjacent.

Find Most Frequent Lines in a Log File

cat log.txt | sort | uniq -c | sort -nr

Counts duplicate lines in a log file and then sorts the results numerically in descending order to show the most frequent lines first.

Count Duplicate Lines Ignoring Case

echo -e "Apple\napple\nBanana\napple" | sort | uniq -ci

Counts duplicate lines from standard input, treating 'Apple' and 'apple' as the same.

Count Duplicates Ignoring Specific Fields

sort -k2 data.txt | uniq -f 1 -c

Counts duplicate lines by ignoring the first field and comparing from the second field onwards. For example, if `data.txt` contains `ID1 apple` and `ID2 apple`, 'apple' will be counted as 2.

Tips & Notes

The `uniq` command fundamentally operates on 'consecutive' duplicate lines. To remove or count duplicates across an entire file, you must first sort the lines using the `sort` command.

Usage Tips

Use with sort: `uniq` only processes consecutive duplicates. To handle duplicates throughout the entire file, sort it first with `sort`. Example: `sort file.txt | uniq -c`
Find Most Frequent Items: Pipe the output of `uniq -c` to `sort -nr` to sort the most frequent items in descending order. Example: `sort file.txt | uniq -c | sort -nr`
Performance Considerations: For very large files, consider the memory usage of `sort` and `uniq`. If necessary, you can specify a temporary directory using `sort`'s `-T` option.