Complex sed search and replace (multiline, regexp, non-greedy)

I had a large chunk of text (output from another command), and needed to perform a search and replace on it. I usually do that either with zsh builting ${var:gs/x/y/} syntax or sed, but this time my pattern was spread on several lines.

Multi-line with --null-data

The trick here is to use sed ---null-data to make it operate on the full text instead of on invidual lines. Technically, it now considers a "line" to end with a NUL character, and not a \n anymore.

Improve readability with --regexp-extended

To improve readability of my regexps, I also used --regexp-extended which allows me to write capturing groups without having to escape them ((.*) instead of \(.*\)).

Non-greedy

sed does not have a non-greedy mode, which means it will always capture the largest group it can.

The way to work around that involves writing a slightly more complex regexp by specifically defining the one character we don't want to capture.

For example with the string foo_bar_baz, I might want to find the part before the first _ by doing ^(.*)_. This will actually return foo_bar because it's greedy.

The way to work around that is to use ^([^_]*)_ instead. This can be read as: capture everything that is not a _ from the start until you met a _.

Splitting a string with zsh

This is one of those transformations I know how to do with node, ruby, or even the command line, but that I always have to refer to Stack Overflow when attempting to do it with zsh.

Hopefully, writing this blog post (and referring to it later) will help me remember how to do it.

To split a string variables by a delimiter, one can use the (${(@s/X/)variableName}) syntax.

  • The wrapping () means that the resulting variable will be treated as an array
  • The ${} interpolation syntax allow passing specific modifiers
  • The (@s/X/) modifier means to split it by the X character.

To split by the / character, you can use an alternate syntax like (@s:/:) instead. The character following the @s is part of the zsh syntax parsing, and the character between them will be your actual delimiter.

ZSH filepath modifiers

zsh comes bundled with variable modifier to alter filepaths and extract the relevant parts.

Given the following code, we can display $filepath in a lot of different ways:

mkdir -p /tmp/subdir
cd /tmp
local filepath=./subdir/file.zsh

| Name | Output | Modifier | Mnemonic | | --------- | ---------------------- | ----------------- | ------------------ | | Absolute | /tmp/subdir/file.zsh | ${filepath:a} | absolute | | Basename | file.zsh | ${filepath:t} | tail | | Filename | file | ${filepath:t:r} | tail rest | | Extension | zsh | ${filepath:e} | extension | | Dirpath | /tmp/subdir | ${filepath:a:h} | absolute h ead |

For clarity

  • tail is everything after the last /
  • head is everything before the last /
  • extension is everything after the last .
  • rest is everything before the last .

Other goodies

  • ${~filepath} will expand ~ to their full path, while ${filepath/#$HOME/\~} will use ~ instead of home path
  • :command gives you the executable path of a command (a bit like which)
  • :q for quoting, :U for unquoting, :x for quoting individual words
  • :l for lowercase, :u for uppercase
  • :2:10 takes a substring from 2 to 10
  • Those modifiers can be applied directly to glob patterns (src/**/*.zsh(:t:r))

Default variable values with zsh

zsh has two modifiers (${:-} and ${:=}) to handle fallback for empty values.

echo ${ahead:-0} will display the variable $ahead, or display 0 if the variable is empty.

echo ${ahead:=0} (note the := instead of :-) will assign 0 to the variable $ahead, and then display it.

They are pretty similar, and in that example the result is the same. But with :=, the variable will still be set to 0 afterward, while with :- it's a one-off thing.

Search and replace with zsh

To search and replace with zsh, there are two ways.

With ${var//XXX/YYY}

This will replace all occurences of XXX in $var with YYY.

Note that you can interpolate variables inside of XXX, so ${var//${input}/YYY} with input="foo" will replace foo with YYY in $var.

You can use only one / instead of // to only replace the first occurence.

With ${var:gs/XXX/YYY}

The :s/XXX/YYY modifier is the basic syntax to replace XXX with YYY.

It does not allow for interpolating variables. You need to replace s with gs to make it a global search and replace.

The only advantage over the other syntax in my opinion is that you can swap the delimiter character (/) with any other character you like. So if your patterns are heavy on /, you can swap them for _ for example for a more readable format, like ${var:gs_/_#} to replace all / with #.