Home > Text Processing & Search > sort

sort -u: Sort with Duplicate Removal

The sort -u command sorts the contents of a text file or standard input and removes duplicate lines, outputting only unique lines. It is very useful for removing duplicates and cleaning up data lists.

Overview

sort -u sorts input data in ascending order and simultaneously removes duplicate lines, leaving only unique ones. It is utilized in various situations such as log file analysis, generating lists of unique items, and data cleansing.

Key Features

  • Sorts data and automatically removes duplicate lines
  • Can be linked with other commands' output via pipes (|)
  • Supports various sorting criteria such as numbers and specific fields
  • Efficient for handling large files

Key Options

This section introduces common options used with the sort command, especially those that complement the -u option.

Duplicate Removal and Basic Operations

Sorting Criteria

Output and Others

Generated command:

Try combining the commands.

Description:

`sort` Executes the command.

Combine the above options to virtually execute commands with AI.

Usage Examples

Learn how to use the sort -u command through various practical examples.

Sort File Contents and Remove Duplicates

sort -u file.txt

Sorts the contents of file.txt, removes duplicate lines, and displays them to standard output.

Process Input via Pipe

cat file.txt | sort -u

Receives output from another command, sorts it, and removes duplicates. For example, passing file contents using the `cat` command.

Sort Numerically and Remove Duplicates

sort -nu numbers.txt

Treats numbers in numbers.txt as numeric values for sorting and removes duplicates.

Sort by Specific Field and Remove Duplicates

sort -t',' -uk2 data.csv

Sorts data.csv, which is comma-separated, by the second field and removes duplicates.

Save Results to a New File

sort -u input.txt -o output.txt

Saves the sorted and de-duplicated results to output.txt.

Tips & Notes

Provides useful tips and important notes when using the sort -u command.

Difference Between sort -u and uniq

  • sort -u: Sorts the entire input and then removes duplicates. Therefore, it removes all duplicates regardless of their position in the file.
  • uniq: Removes only adjacent duplicate lines. To use `uniq`, you must first sort the data using the `sort` command. `sort -u` conveniently handles both processes in one step.

Case Sensitivity

  • Default Behavior: By default, sort -u is case-sensitive, treating 'Apple' and 'apple' as different lines.
  • Ignoring Case: To ignore case and remove duplicates, use the `-f` (fold-case) option. Example: `sort -uf file.txt`

Performance for Large Files

  • Memory Usage: When processing large files, sort can consume significant system memory. You can adjust performance by specifying the memory buffer size using the `-S` option. (e.g., `-S 50%` uses 50% of available memory)
  • Temporary Files: If memory is insufficient, sort creates temporary files. You can specify the directory for temporary files using the `-T` option. (e.g., `-T /tmp`)

Same category commands