Home > Text Processing & Search > awk-gsub

awk-gsub: Global String Substitution

The `gsub` function within the `awk` command is used for globally substituting all occurrences of a pattern matching a specific regular expression with another string. It is highly useful for batch changes to patterns that appear multiple times in file content or streams.

Overview

`gsub` stands for Global Substitution. It is a function within `awk` scripts that replaces all parts matching a specific regular expression with a specified string. Unlike the `sub` function, which only replaces the first match, `gsub` replaces all matches.

gsub Function Syntax

gsub(regex, replacement, [target_string])

  • regex: The regular expression defining the pattern to find.
  • replacement: The string to replace the found pattern with.
  • target_string: The string on which to perform the substitution. If omitted, it applies to the entire current record ($0).

Return Value

The `gsub` function returns an integer representing the number of substitutions made. This return value can be useful in conditional statements.

Usage Examples

Various examples of string substitution using the `gsub` function.

Replace all 'old' with 'new' in a file

echo 'This is an old file with old data.' > data.txt
awk '{gsub(/old/, "new"); print}' data.txt

Changes all occurrences of the word 'old' to 'new' in the file data.txt. Prints the modified content.

Replace spaces with hyphens

echo 'Hello World from Awk' | awk '{gsub(/ /, "-"); print}'

Changes all spaces in the input string to hyphens (-).

Substitute only within a specific field

echo 'field1 field2 apple' | awk '{gsub(/a/, "X", $3); print}'

Replaces 'a' with 'X' only within the third field ($3). (e.g., 'apple banana cat' -> 'apple banana cXt')

Case-insensitive substitution

echo 'Apple is an apple.' | awk 'BEGIN {IGNORECASE=1} {gsub(/apple/, "orange"); print}'

Sets the `IGNORECASE` variable to 1 to perform a case-insensitive substitution of 'apple' with 'orange'.

Remove only numbers

echo 'Product ID: 12345 ABC' | awk '{gsub(/[0-9]/, ""); print}'

Removes all digits from the string.

Tips & Precautions

Useful tips and points to note when using the `gsub` function.

Difference from sub function

`gsub` substitutes all matches, whereas the `sub` function substitutes only the first match. Choose the appropriate function based on your needs.

  • `gsub(regex, replacement, target)`: Substitutes all matches.
  • `sub(regex, replacement, target)`: Substitutes only the first match.

Regular Expression Special Characters

To use special characters like `.`, `*`, `+`, `?`, `[`, `]`, `(`, `)`, `|`, `^`, `$`, `\` literally within a regular expression, they must be escaped with `\`. For example, to find a literal dot (.), use `\.`.

Caution when omitting target string

If `target_string` is omitted, the substitution operation defaults to the entire current record ($0). To apply it only to a specific field, you must specify the field as `gsub(regex, replacement, $N)`.

Backslash in replacement string

A backslash (`\`) in the replacement string can have special meanings. For example, `\&` represents the entire matched string. To use a literal backslash, it must be escaped twice, like `\\`.


Same category commands