Overview
sort -u sorts input data in ascending order and simultaneously removes duplicate lines, leaving only unique ones. It is utilized in various situations such as log file analysis, generating lists of unique items, and data cleansing.
Key Features
- Sorts data and automatically removes duplicate lines
- Can be linked with other commands' output via pipes (|)
- Supports various sorting criteria such as numbers and specific fields
- Efficient for handling large files
Key Options
This section introduces common options used with the sort command, especially those that complement the -u option.
Duplicate Removal and Basic Operations
Sorting Criteria
Output and Others
Generated command:
Try combining the commands.
Description:
`sort` Executes the command.
Combine the above options to virtually execute commands with AI.
Usage Examples
Learn how to use the sort -u command through various practical examples.
Sort File Contents and Remove Duplicates
sort -u file.txt
Sorts the contents of file.txt, removes duplicate lines, and displays them to standard output.
Process Input via Pipe
cat file.txt | sort -u
Receives output from another command, sorts it, and removes duplicates. For example, passing file contents using the `cat` command.
Sort Numerically and Remove Duplicates
sort -nu numbers.txt
Treats numbers in numbers.txt as numeric values for sorting and removes duplicates.
Sort by Specific Field and Remove Duplicates
sort -t',' -uk2 data.csv
Sorts data.csv, which is comma-separated, by the second field and removes duplicates.
Save Results to a New File
sort -u input.txt -o output.txt
Saves the sorted and de-duplicated results to output.txt.
Tips & Notes
Provides useful tips and important notes when using the sort -u command.
Difference Between sort -u and uniq
- sort -u: Sorts the entire input and then removes duplicates. Therefore, it removes all duplicates regardless of their position in the file.
- uniq: Removes only adjacent duplicate lines. To use `uniq`, you must first sort the data using the `sort` command. `sort -u` conveniently handles both processes in one step.
Case Sensitivity
- Default Behavior: By default, sort -u is case-sensitive, treating 'Apple' and 'apple' as different lines.
- Ignoring Case: To ignore case and remove duplicates, use the `-f` (fold-case) option. Example: `sort -uf file.txt`
Performance for Large Files
- Memory Usage: When processing large files, sort can consume significant system memory. You can adjust performance by specifying the memory buffer size using the `-S` option. (e.g., `-S 50%` uses 50% of available memory)
- Temporary Files: If memory is insufficient, sort creates temporary files. You can specify the directory for temporary files using the `-T` option. (e.g., `-T /tmp`)