Complex sed search and replace (multiline, regexp, non-greedy)
02 Feb 2023I had a large chunk of text (output from another command), and needed to perform a search and replace on it. I usually do that either with zsh builting ${var:gs/x/y/}
syntax or sed
, but this time my pattern was spread on several lines.
Multi-line with --null-data
The trick here is to use sed ---null-data
to make it operate on the full text instead of on invidual lines. Technically, it now considers a "line" to end with a NUL
character, and not a \n
anymore.
Improve readability with --regexp-extended
To improve readability of my regexps, I also used --regexp-extended
which allows me to write capturing groups without having to escape them ((.*)
instead of \(.*\)
).
Non-greedy
sed
does not have a non-greedy mode, which means it will always capture the largest group it can.
The way to work around that involves writing a slightly more complex regexp by specifically defining the one character we don't want to capture.
For example with the string foo_bar_baz
, I might want to find the part before the first _
by doing ^(.*)_
. This will actually return foo_bar
because it's greedy.
The way to work around that is to use ^([^_]*)_
instead. This can be read as: capture everything that is not a _
from the start until you met a _
.
Want to add something ? Feel free to get in touch on Bluesky : @pixelastic.bsky.social