The Valuable Dev

A Practical Guide to GNU sed With Examples

Bilbo trying to edit the stream of Rivendell

The sun is shining today; too bad, you’re stuck in the office of your beloved company, MegaCorpMoneyMaker. Your task is to delete specific lines across thousands of XML files; it’s the “API” of an external warehouse, and, as always, they screwed it up.

You begin to write a script using your favorite programming language, when suddenly Davina, your colleague developer, comes to your desk:

“You know that you don’t have to write a script to do that? You could simply use sed in your terminal.”

You used sed in the past, but only to substitute some words with others. How could Davina delete specific lines depending on their content? You don’t have time to think more: Davina is already on your keyboard.

“It’s super easy. I’ll show you!”

Curious to learn more, you let her explain what sed is all about. This article is the transcription of this magical day which changed the world.

More specifically, Davina explained the following:

  • What argument we can give to sed.
  • What’s a sed script.
  • How to write the input file in place.
  • How to use an address in a sed script to edit specific lines.
  • How to use the commands print and delete.
  • How to invert the address.
  • How to use more than one command in a sed script.
  • How to use the substitute command.

As the title suggest, we’ll focus on GNU sed in this article. If you don’t have it, I’d recommend you to install and use it. To know if you have GNU sed, simply run sed --version in your shell; if it doesn’t work, you don’t have it. If it does work, you’ll get the information you seek.

Also, if you prefer watching videos instead of reading, you’ll find two at the end of this article, recorded by your humble servant.

Last thing: you can download the companion file if you want to follow along and try by yourself the different commands. I’d recommend you to do so, to remember what we’ll see here, and be able to use sed in different contexts.

It’s time! Get your diving gear and let’s explore the intricate caves of our stream of text.

The Basics of sed

Let’s begin by the obvious: what on Earth does “sed” mean? This lovely name is for stream editor. It’s indeed an editor which follows this workflow:

  1. Take a stream of text as input.
  2. Select some specific lines.
  3. Perform some operations on each line selected.
  4. Output the resulting text.

The second and third steps are done thanks to a sed script. We’ll look at this concept later; first, let’s look at what arguments we can give to sed in our shell.

General Command-Line Syntax

Our stream editor can take two arguments:

sed [script] [file...]

The [script] defines what operations you want to do, and the [file...] are the files you want to work on.

Let’s look at the simplest example ever:

sed '' nginx.conf

Here’s the result:

Running sed without script output the whole input

By default, sed output (or “print”) every single line of its input. Because our sed script is empty, sed only output the file. As a result, the command above is equivalent to the shell command cat nginx.conf.

Files are not the only possible input. We can also directly pipe to sed. For example:

echo "this is some input" | sed ''

The result:

We can pass an input to sed with a shell pipe

Wonderful.

The sed Script

The purpose of a sed script is to perform operations on the input.

The Three Parts of a sed Script

A sed script can be divided in three parts; they should be given in order:

NameDescription
AddressThe lines you want to edit. It’s always followed by a command.
CommandThe operation you want to perform. It’s always a single letter.
OptionsA couple of commands can have options, like the substitute command for example.

Here’s an example of a sed script using the command delete:

sed 'd' nginx.conf

By default, the command of a sed script operate on each line. As a result, we delete here every single line. That’s why the output is empty.

Thankfully, It doesn’t mean that we delete all the content of the file nginx.conf; by default, sed will never modify the file (or input) given. Instead, it will:

  1. Copy each line of the input in a buffer.
  2. Do the operations described by the sed script on the buffered lines.
  3. Output the (possibly modified) buffered lines.

You can also directly modify the file in place if you want to.

Writing Directly The Input File

It might be confusing for beginners to see an editor which doesn’t edit a file directly; it was confusing for me at least. But it’s a good thing: contrary to most text editors out there, there’s no undo functionality with sed. When you modify your input file, you can’t go back.

Let’s add an address to our sed script, to only delete the first line:

sed '1d' nginx.conf

As we saw, this won’t affect the file nginx.conf, but the output. As a result, if you want to save your editing, you could redirect this output to a new file:

sed '1d' nginx.conf > nginx.new.conf

If you really want to modify the input file, you can also add the option -i to your shell command, to modify directly the file in place. I would recommend you to create a backup before doing so; for example:

cp nginx.conf nginx.conf.back
sed -i '1d' nginx.conf

This time, the file nginx.conf.back will have the content of nginx.conf before any sed editing, and the first line of the file nginx.conf will be deleted. You can actually do this operation in one command-line:

sed -i.back '1d' nginx.conf

Our stream editor will automatically:

  1. Copy the input file, and add the suffix .back to the filename (here nginx.conf.back).
  2. Modify the input file in place.

To summarize:

We can directly modify the input file in place with sed

The Address: Selecting Specific Lines

The address indicates what to copy in sed’s buffer, to then apply a sed command to each of them. As a result, the address is always before the command in a sed script.

An address can be:

  • A line number.
  • A range of lines
  • Every nth line.
  • A regex.

In our previous example, we used the address 1 in the sed script to delete the first line:

sed '1d' nginx.conf

Selecting Specific Line Numbers

We can directly give line numbers as addresses. Here are a couple of examples:

ExampleDescriptionType
sed '1d' nginx.confDelete the first line.Line number.
sed '2,5d' nginx.confDelete the lines 2 to 5 (included).Range of line.
sed '0~2d' nginx.confDelete every even line (from line 0, delete every 2 lines).Every nth line.
sed '1~2d' nginx.confDelete every odd line (from line 1, delete every 2 lines).Every nth line

Selecting Lines Using Regular Expressions

We can also select the lines we want using regular expressions (regexes). Here are more examples:

ExampleDescriptionType
sed '/include/d' nginx.confDelete each line matching the pattern include.Regex.
sed '3,/include/d' nginx.confDelete from the 3rd line until the first line matching the pattern include.Range of line using a regex.
sed '/http/,/include/d' nginx.confDelete every line, from the first line matching http to the first line matching include.Range of lines using two regexes.

If you want to know more about regexes, this article explores the basics using Vim.

Let’s look closer at the following example:

sed '/http/,/include/d' nginx.conf

The result:

We can use regexes as range in sed

In our file, the first instance of http is at line 1, and the first instance of include is at line 2, so both lines are deleted. Nothing groundbreaking here.

But if you give a range ending with a regex which doesn’t match any other subsequent line, your sed command will operate until the end of your input. Here, the regex http can also be matched at line 8, and no following line matches include. As a result, everything from line 8 until the end of the input is deleted.

In the same spirit, if you use a range beginning with a regex, but this regex is never matched, your sed command will never operate on anything.

Case Insensitive Regex

If you don’t want to be bothered by the case, you can add the flag I after a regex to make it case isensitive. For example:

sed '/HTTP/I,/InClUdE/Id' nginx.conf

The result:

Add the 'I' flag to a sed regex for case-insensitive

The Extended Regex Engine

You can use one of two regex engines with GNU sed:

  • Basic regex engine (the default).
  • Extended regex engine.

With the basic regex engine, you’ll have to escape the following characters if you want them to be regex metacharacters: ?, +, (), {}, and |. The extended regex engine includes them; to use it, simply add the option -E to your shell command.

For example, these two command-lines are equivalent:

sed '/server\|include\|on\|log\|^$/d' nginx.conf
sed -E '/server|include|on|log|^$/d' nginx.conf

The breathtaking result:

Using the extended regex engine when using regexes with sed is the best

Escaping characters is often a bad idea: it makes the regex more difficult to read and understand. Personally, to keep things simple, I use the extended regex engine each time I use regex metacharacters.

Only Output the Chosen Lines

As we saw at the very beginning, sed always output every line of the input by default. You can disable this with the command-line option -n (for no print), or its more understandable equivalents --quiet or --silent.

For example, the following doesn’t output anything:

sed -n '' nginx.conf
sed --quiet '' nginx.conf

What’s the point, then? It’s where the sed command p becomes useful: it prints the lines we address, to only output them. For example, to display the first line of a file:

sed -n '1p' nginx.conf
sed --quiet '1p' nginx.conf
Printing the lines we want with sed

Without the option --quiet (or -n), you’d get the first line of the file, as well as all the lines of the file! As a result, the first line would be printed twice, and every other line printed once:

Printing lines with sed without the noprint option

This also allows us to mimic the behavior of grep with sed; to only output the lines matching a specific pattern, we can simply give a regex as address, and print only the lines matching it. For example:

sed -n '/include/p' nginx.conf

Here’s a quick comparison:

How to use sed to mimic grep

This make sed a powerful tool to only output a part of your input stream. It’s handy if you want to work with a specific ranges of line, with sed or with any other CLI tool (using a pipe). For example, if you want to know how many lines there are between the beginning of the file until the first line matching include:

sed --quiet '1,/include/p' nginx.conf | wc -l

The result:

The print sed command is useful to pipe it through other CLI tool

Inverting the Address

It’s possible to invert the concept of address: instead of operating on the lines addressed, we could operate on every line except the ones addressed. To do so, we need to add a bang ! after the address itself.

Here are some examples to illustrate the point:

ExampleDescription
sed '1!d' nginx.confDelete every line except the first one.
sed '/include/!d' nginx.confDelete every line except the ones matching the pattern include.

As a result, the two following shell commands are equivalent:

sed '/include/!d' nginx.conf
sed -n '/include/p' nginx.conf

To prove my discourse:

How to invert the address in a sed script

Using More than One Command

It’s also possible to use more than one command in our sed scripts (with different addresses), or even use multiples sed script in one shell command.

Using Multiple Commands in a sed Script

You can use more than one command in a sed script if you separate them with ;. For example:

sed '1,10d;15,$d' nginx.conf

The first script 1,10d deletes all the lines from line 1 to 10, and the second script 15,$d deletes all the lines from line 15 to the end of the input.

Using Multiple sed Scripts

You can also use the command-line option -e (or --expression) to execute multiple scripts:

sed -e '1,10d' -e '15,$d' nginx.conf

The illustration we all waited for:

How to use multiple scripts in a sed shell command

The Substitute Command

We already spoke about two important sed commands in this article: the delete and the print commands. While they’re useful in their own right, many use sed for its substitute command.

General Syntax

As we saw, a sed script can be made of three elements: the address, the sed command, and possible options depending on the command. These elements need to be given in this order.

The substitute command is one of the few sed commands taking some options. Here’s the general syntax:

s/<pattern>/<replacement>/[flag]

This command will try to match the regular expression <pattern> on each line of your input, and replace these matches with the <replacement>. For example, if you want to replace the string “index” by “page”, you can do:

sed 's/index/page/' nginx.conf

Here’s the difference between the input (the file nginx.conf) and the output:

inputindex index.html index.htm index.php;
outputpage index.html index.htm index.php;

Replacing a Specific Match on a Line

By default, if your pattern matches more than one string of characters on the line, only the first one will be replaced. If you want to replace every matches on each line, you need to use the global flag:

sed 's/index/page/g' nginx.conf

Again, here’s the difference between the input and output:

inputindex index.html index.htm index.php;
outputpage page.html page.htm page.php;

You can also replace a specific match on the line by giving a number as flag. This number represents the nh match you want to replace. For example, if you want to replace the second match:

sed 's/index/page/2' nginx.conf

The result:

inputindex index.html index.htm index.php;
outputindex page.html index.htm index.php;

Finally, you might want to replace every match from the nth match on. To do so, you can add two flags: a number and the global flag. For example:

sed 's/index/page/2g' nginx.conf

The difference between input and output:

inputindex index.html index.htm index.php;
outputindex page.html page.htm page.php;

Reusing the Pattern in the Replacement

You can reuse the pattern you want to match in your replacement thanks to the character &. For example:

sed 's/index/&-page/' nginx.conf

The result:

inputindex index.html index.htm index.php;
outputindex-page index.html index.htm index.php;

You could also use regexes with grouping and backreference. For example:

sed -E 's/(ind)(ex)/\1-page-\2/' nginx.conf

We use here the extended regular expression engine to avoid escaping the parentheses. The result:

inputindex index.html index.htm index.php;
outputind-page-ex index.html index.htm index.php;

Using Different Separators

Using the slash / as a separator between your pattern, your replacement, and the optional flags can bring some problems. If you work with a pattern (or a replacement) which has already some slashes (like URLs), you’d need to escape every single one of them for sed to understand what slashes are separators and what slashes aren’t.

For example, the following won’t work:

sed 's/http://server.com/ftp://ftpserver/' nginx.conf

The unfortunate result:

Using the separator in a substitute as characters doesn't work

Instead, we need to escape each slash which is not a separator:

sed 's/http:\/\/server.com/ftp:\/\/ftpserver/' nginx.conf

As we saw already, we should avoid escaping characters as much as we could. But fear not! There’s a better solution. You can actually use other characters as separators (if they’re not alphanumerical characters), like |, % or # (my personal favorite). For example:

sed 's#http://server.com#ftp://ftpserver#' nginx.conf
sed 's%http://server.com%ftp://ftpserver%' nginx.conf

The result:

inputproxy_pass http://server.com/this/is/a/massive/server;
outputproxy_pass ftp://ftpserver/this/is/a/massive/server;

Only Displaying the Replaced Lines

We saw above that it was possible to only display the lines addressed, thanks to the print command. But what if we want to only display the lines where some replacements are made?

We could use two commands for that: first the substitute command to replace our characters, and second the print command to only output the lines replaced. For example:

sed -n 's/index/page/g;/page/p' nginx.conf
sed -n -e 's/index/page/g' -e '/page/p' nginx.conf

But it’s not really a good solution. If any other line matches the address /page/ in the file for example, they would be added to the output even if no replacements were made.

A better solution is to add the print flag to the options of our substitute command:

sed 's/index/page/gp' nginx.conf

But this doesn’t really work as expected. Remember: sed print already all the lines of the file (or input) by default; to only print the lines which have some replacements, we need to only output what we want to print explicitly:

sed -n 's/index/page/gp' nginx.conf

Here’s the difference:

Using the print flag with the sed substitute command

Don’t be confused here: we use here the flag p for the sed command s, not the sed command p.

Special Sequences

We can use special sequences in the <replacement> of the substitute command, to convert one or multiple characters to uppercase or lowercase:

SequenceDescription
uThe next character becomes uppercase.
lThe next character becomes lowercase.
\UAll characters following this sequence become uppercase, until the next \E.
\LAll characters following this sequence become lowercase, until the next \E.
\EStop the case conversion started by \U or \L.

For example:

sed -E 's/(in)dex/\U\1\E-\u&-page/2g' nginx.conf

Here’s the difference between the input and the output:

inputindex index.html index.htm index.php;
outputindex IN-Index-page.html IN-Index-page.htm IN-Index-page.php;

You’re Now Ready to Surf on the Stream

The next time you need to do some bulk editing on multiple files, or if you need to do some editing in a Bash script, sed is a really strong option to do so.

What did we see in this article?

  • The sed command-line can take two arguments: a sed script and a file.
  • Instead of a file, an input can also be piped to sed.
  • A sed script is composed of three parts: the address, the command, and possibly the command’s options.
  • By default, sed copy the input, edit it, and output the result. It means that the input itself is never modified.
  • If you modify directly the file given to sed, you can’t undo the editing. As a result, create a backup if you modify the file in place.
  • Adding an address in a sed script allows you to output specific lines.
  • An address can target some line numbers, or lines matching some regexes.
  • It’s possible to use more than one sed script by shell command, using the option -e (or --expression).
  • The substitute command is one of the most useful commands to find and replace some text.
  • The substitute command can take some options, separated by non-alphanumerical characters: a pattern, a replacement, and optional flags.
  • With the substitute command, you can target the exact match you want to replace using flags.
  • It’s possible to re-use part of the pattern in your replacement in your substitute command.

We saw simple examples in this article, but sed script is actually a Turing complete programming language: here’s a TicTacToe implemented in sed.

There are also some videos on my YouTube channel about sed:

Share Your Knowledge