Overview
`gsub` stands for Global Substitution. It is a function within `awk` scripts that replaces all parts matching a specific regular expression with a specified string. Unlike the `sub` function, which only replaces the first match, `gsub` replaces all matches.
gsub Function Syntax
gsub(regex, replacement, [target_string])
- regex: The regular expression defining the pattern to find.
- replacement: The string to replace the found pattern with.
- target_string: The string on which to perform the substitution. If omitted, it applies to the entire current record ($0).
Return Value
The `gsub` function returns an integer representing the number of substitutions made. This return value can be useful in conditional statements.
Usage Examples
Various examples of string substitution using the `gsub` function.
Replace all 'old' with 'new' in a file
echo 'This is an old file with old data.' > data.txt
awk '{gsub(/old/, "new"); print}' data.txt
Changes all occurrences of the word 'old' to 'new' in the file data.txt. Prints the modified content.
Replace spaces with hyphens
echo 'Hello World from Awk' | awk '{gsub(/ /, "-"); print}'
Changes all spaces in the input string to hyphens (-).
Substitute only within a specific field
echo 'field1 field2 apple' | awk '{gsub(/a/, "X", $3); print}'
Replaces 'a' with 'X' only within the third field ($3). (e.g., 'apple banana cat' -> 'apple banana cXt')
Case-insensitive substitution
echo 'Apple is an apple.' | awk 'BEGIN {IGNORECASE=1} {gsub(/apple/, "orange"); print}'
Sets the `IGNORECASE` variable to 1 to perform a case-insensitive substitution of 'apple' with 'orange'.
Remove only numbers
echo 'Product ID: 12345 ABC' | awk '{gsub(/[0-9]/, ""); print}'
Removes all digits from the string.
Tips & Precautions
Useful tips and points to note when using the `gsub` function.
Difference from sub function
`gsub` substitutes all matches, whereas the `sub` function substitutes only the first match. Choose the appropriate function based on your needs.
- `gsub(regex, replacement, target)`: Substitutes all matches.
- `sub(regex, replacement, target)`: Substitutes only the first match.
Regular Expression Special Characters
To use special characters like `.`, `*`, `+`, `?`, `[`, `]`, `(`, `)`, `|`, `^`, `$`, `\` literally within a regular expression, they must be escaped with `\`. For example, to find a literal dot (.), use `\.`.
Caution when omitting target string
If `target_string` is omitted, the substitution operation defaults to the entire current record ($0). To apply it only to a specific field, you must specify the field as `gsub(regex, replacement, $N)`.
Backslash in replacement string
A backslash (`\`) in the replacement string can have special meanings. For example, `\&` represents the entire matched string. To use a literal backslash, it must be escaped twice, like `\\`.