Home > Text Processing & Search > uniq

uniq: Remove and Count Duplicate Lines

The `uniq` command is used to filter or report lines that are adjacent and identical in a text file or standard input. It's particularly useful with the `-c` option, which prefixes each line with the count of its occurrences, making it valuable for data analysis.

Overview

`uniq` is often used in conjunction with the `sort` command via a pipe (|) to efficiently process duplicate lines in sorted data. The `-c` option allows for easy identification of the frequency of duplicate lines.

Key Features

  • Processes consecutive duplicate lines
  • Counts duplicate lines (-c)
  • Case-insensitive comparison option (-i)
  • Ignores specific fields or characters for comparison

Key Options

Functionality

Comparison Method

Generated command:

Try combining the commands.

Description:

`uniq` Executes the command.

Combine the above options to virtually execute commands with AI.

Usage Examples

Calculate Word Frequency in a File

sort words.txt | uniq -c

Counts the occurrences of each word (line) in the `words.txt` file. Since `uniq` only processes consecutive duplicates, `sort` is used first to make all duplicates adjacent.

Find Most Frequent Lines in a Log File

cat log.txt | sort | uniq -c | sort -nr

Counts duplicate lines in a log file and then sorts the results numerically in descending order to show the most frequent lines first.

Count Duplicate Lines Ignoring Case

echo -e "Apple\napple\nBanana\napple" | sort | uniq -ci

Counts duplicate lines from standard input, treating 'Apple' and 'apple' as the same.

Count Duplicates Ignoring Specific Fields

sort -k2 data.txt | uniq -f 1 -c

Counts duplicate lines by ignoring the first field and comparing from the second field onwards. For example, if `data.txt` contains `ID1 apple` and `ID2 apple`, 'apple' will be counted as 2.

Tips & Notes

The `uniq` command fundamentally operates on 'consecutive' duplicate lines. To remove or count duplicates across an entire file, you must first sort the lines using the `sort` command.

Usage Tips

  • Use with sort: `uniq` only processes consecutive duplicates. To handle duplicates throughout the entire file, sort it first with `sort`. Example: `sort file.txt | uniq -c`
  • Find Most Frequent Items: Pipe the output of `uniq -c` to `sort -nr` to sort the most frequent items in descending order. Example: `sort file.txt | uniq -c | sort -nr`
  • Performance Considerations: For very large files, consider the memory usage of `sort` and `uniq`. If necessary, you can specify a temporary directory using `sort`'s `-T` option.

Related commands

These are commands that are functionally similar or are commonly used together.


Same category commands