sed and awk

sed and awk are domain-specific text filtering and modification tools.

Example

I have made/adapted a gemtext to HTML converter in both awk and sed so we can see how these two tools differ and generally how they work. The sed implementation is much less complete than awk, creating working but gross HTML, but it still is a good example of it's more mundane uses and some more complex ones.

sed

# https://codegolf.stackexchange.com/a/220170/98237
# This will convert gemtext to valid, but not good, HTML
# I changed this to make it more complete for the gemtext bc why not
# Still missing multiline blockquotes

# Remove all empty lines
/^[[:space:]]*$/d

# Convert HTML reserved characters to HTML entities
s_&_&_g
s_<_\&lt;_g
s_>_\&gt;_g
s_"_\&#34;_g

# Set the last branch point so the final `t` branch command will work
ta
# Set label to `a`
:a

    # Convert lines starting with an asterisk to a list item
    s_^* \{0,1\}\(.*\)_<li>\1</li>_
    # Convert lines starting with hashes to a heading
    s_^# \{0,1\}\(.*\)_<h1>\1</h1>_
    s_^## \{0,1\}\(.*\)_<h2>\1</h2>_
    s_^### \{0,1\}\(.*\)_<h3>\1</h3>_
    # Convert lines starting with `=>` to links. Include href text if exists
    s_^=&gt; \([^ ]*\) \(.*\)_<a href="\1">\2</a>_
    s_^=&gt; \([^ ]*\)$_<a href="\1">\1</a>_
    # Convert lines starting with `>` to blockquote
    s_^&gt; \{0,1\}\(.*\)_<blockquote>\1</blockquote>_

# If a heading, list item, or link substitution was made, skip to end of script
t

# Insert a starting paragraph tag and append a closing paragraph tag to the line
i<p>
a</p>

awk

# Flip boolean flag
function flip(num) {
  if (num > 0) return 0
  else return 1
}

function getHeaderDepth(line) {
  return gsub(/^#+/, "", line)
}

# Set variables

BEGIN {
  inBlockquote = 0
  inCode = 0
  inEmpty = 0
  inList = 0
  tempLine = ""
}

# Start/end code block
/^```/ {
  if (inCode == 0) {
    print "<pre><code>"
  } else {
    print "</code></pre>"
  }
  inEmpty = 0
  inCode = flip(inCode)
  next
}

# If in code block, print line
# Replace all HTML entities with proper code
// {
  if (inCode == 1) {
    print
    next
  }
  # HTML Entities
  gsub("&", "\\\&amp;")
  gsub(">", "\\\&gt;")
  gsub("<", "\\\&lt;")
  gsub("\"", "\\\&#34;")
}

# If line empty, close all open tags.
#   If more than one newline, print newline
NF == 0 {
  if (inEmpty) {
    print "<br/>"
    next
  }
  tempLine = ""
  if (inBlockquote == 1) {
    tempLine = tempLine "</blockquote>"
    inBlockquote = 0
  }
  if (inList == 1) {
    tempLine = tempLine "</ul>"
    inList = 0
  }
  if (tempLine) print tempLine
  inEmpty = 1
  next
}

NF > 0 {
  inEmpty = 0
}

# Header
/^#{1,3}/ {
  headerDepth = gsub(/^#+ +/, "", $0)
  printf "<h%d>%s</h%d>\n", headerDepth, $0, headerDepth
  next
}

# Blockquote
/^&gt; / {
  if (inBlockquote == 0) {
    inBlockquote = flip(inBlockquote)
    print "<blockquote>"
  }
  gsub(/^&gt; /, "", $0)
  print
  next
}

# List
/^\* / {
  if (inList == 0) {
    inList = flip(inList)
    print "<ul>"
  }
  gsub(/^\* /, "", $0)
  print "<li>" $0 "</li>"
  next
}

# Link
/^=&gt; / {
  href = $2
  tempLine = "<p><a href=\"" href "\">"
  gsub(/^=&gt; [^ ]+ ?/, "", $0)
  if (length($0)) {
    print tempLine $0 "</a></p>"
  } else {
    print tempLine href "</a></p>"
  }
  next
}

# Paragraph
// {
  print "<p>" $0 "</p>"
}

References

  1. https://www.gnu.org/software/gawk/manual/gawk.pdf
  2. https://github.com/codenameyau/sed-awk-cheatsheet
  3. https://www.grymoire.com/Unix/Awk.html
  4. https://compudanzas.net/awk.html
  5. http://linuxfocus.org/~guido/scripts/awk-one-liner.html

Last modified: 202212070107