Home > Text Processing & Search > csplit

csplit: Split files based on context

The csplit command is used to split a file into multiple smaller files based on specific patterns (regular expressions) or line numbers. It is useful for analyzing or managing large log files or source code by dividing them into specific sections.

Overview

csplit splits an input file into multiple output files according to specified patterns or line numbers. Each output file contains a consecutive section of the original file, and the filenames are composed of a specified prefix and a numeric suffix.

Key Features

  • Splitting based on regular expressions or line numbers
  • Ability to specify output filename prefix and suffix format
  • Facilitates extraction and management of specific sections in large files

Key Options

Output File Control

Splitting Behavior Control

Generated command:

Try combining the commands.

Description:

`csplit` Executes the command.

Combine the above options to virtually execute commands with AI.

Usage Examples

Splitting a file by line numbers

echo -e "$(seq 1 35)" > test.txt
csplit test.txt 10 20 30

Splits the test.txt file at lines 10, 20, and 30. (e.g., xx00: lines 1-9, xx01: lines 10-19, xx02: lines 20-29, xx03: from line 30 to the end)

Splitting a file by regular expression

echo -e "Line 1\nLine 2\nERROR: First error\nLine 4\nLine 5\nERROR: Second error\nLine 7" > log.txt
csplit -f part_ log.txt '/^ERROR:/' '{*}'

Splits the log.txt file based on lines starting with '^ERROR:', and sets the file prefix to 'part_'. '{*}' means to group all remaining content into a single file.

Splitting with specified prefix and digit count

echo -e "[Section 1]\nContent A\n[Section 2]\nContent B\n[Section 3]\nContent C" > data.log
csplit -f my_file_ -n 3 data.log '/^\[Section \d+\]/' '{*}'

Splits the data.log file by the pattern '[Section N]' and creates filenames like 'my_file_000', 'my_file_001', etc.

Splitting while excluding matching lines

echo -e "Line 1\nERROR: First error\nLine 3\nERROR: Second error" > log.txt
csplit --suppress-match -f no_error_ log.txt '/^ERROR:/' '{*}'

Splits the log.txt file by the pattern '^ERROR:', but excludes lines starting with 'ERROR:' from each split file.

Tips & Precautions

The csplit command is powerful, but caution should be exercised when using regular expressions.

Useful Tips

  • When using regular expressions, enclose them in quotes to prevent shell interpretation.
  • By default, the line that serves as the splitting criterion becomes the first line of the next file. You can exclude this line using the `--suppress-match` option.
  • '{*}' means to group all remaining files into a single file. Without this option, content after the last splitting criterion will be discarded.

Precautions

  • Original file is not modified: csplit creates new split files without altering the original file.
  • Error handling: Errors may occur if regular expressions do not match or line numbers are invalid. Regular expressions must be written accurately.

Same category commands