
grep finds lines. cut extracts fields. sort and uniq count and rank. sed edits. awk does all of that, and it does it with logic. Every tool covered so far in this series does one thing well. awk does several things in one pass — it reads input, splits it into fields, applies conditions, performs calculations, and formats output. It is the tool you reach for when a pipeline of simpler tools starts to feel like the wrong approach. This article covers awk from the ground up — how it thinks, every practical feature, and the patterns that show up in real security work. How awk Thinks awk reads each record (by default, one line at a time), splits it into fields, tests each pattern you wrote, and runs the associated action on records that match. If no pattern is given, the action runs on every line. If no action is given, a matching pattern causes awk to print the full line. bash awk 'condition { action }' filename some_command | awk 'condition { action }' Both the condition and the action are optional independently. A rule with only a condition prints matching lines. A rule with only an action runs on every line. Fields When awk reads a line, it automatically splits it into fields. By default, fields are separated by runs of whitespace — one or more spaces or tabs collapse into a single separator. This is why awk handles messy or inconsistent spacing better than cut, which treats each character as its own delimiter. Fields are referenced as $1 , $2 , $3 , and so on. $0 is the entire line. bash echo "root x 0 0" | awk '{print $1}' Output: root bash echo "root x 0 0" | awk '{print $1, $3}' Output: root 0 A comma between arguments in a print statement outputs them separated by the output field separator — a space by default. -F — Change the Field Separator bash awk -F':' '{print $1}' /etc/passwd Sets the field separator to : . Field 1 is now the username — the same result as cut -d':' -f1 , but with awk's full logic available. bash awk -F',' '{print $2, $4}' results.csv Comma-separated input. Print fields 2 and 4. In GNU awk (gawk), you can use a regex as the separator: bash awk -F'[,;]' '{print $1}' data.txt Splits on either comma or semicolon. This is a gawk extension — not guaranteed in all awk implementations. Part One — Printing and Formatting print — Output Fields bash awk '{print $1}' file.txt Prints field 1 of every line. bash awk '{print $1, $2, $3}' file.txt Prints fields 1, 2, and 3, separated by spaces. bash awk '{print $NF}' file.txt NF is a built-in variable that holds the number of fields on the current line. $NF is therefore the last field — regardless of how many fields the line has. bash awk '{print $(NF-1)}' file.txt The second-to-last field. printf — Formatted Output printf gives you control over formatting — padding, alignment, decimal places. bash awk '{printf "%-20s %s\n", $1, $2}' file.txt %-20s — left-align the first field in a 20-character column. %s — second field. \n — newline (printf does not add one automatically). bash awk '{printf "%d\n", $3}' file.txt Print field 3 as an integer. Useful when you are generating tabular output for a report or aligning columns from inconsistently spaced input. OFS — Output Field Separator OFS controls how print joins multiple arguments when they are separated by commas in the print statement. By default that separator is a space. OFS changes it. bash awk -F':' 'BEGIN{OFS=","} {print $1,$3,$7}' /etc/passwd BEGIN runs before any input is processed — a special pattern, not an ordinary condition. Setting OFS="," here means every print statement that joins fields with commas will use , as the separator in the output. The result is a CSV of username, UID, and shell. OFS has no effect on printf — formatted output is controlled entirely by the format string you write. Part Two — Conditions A condition narrows which lines an action applies to. Without one, the action runs on every line. Pattern Matching bash awk '/error/' file.txt Prints every line containing error . No action specified — awk defaults to printing the matching line. Equivalent to grep "error" file.txt , but with awk's field awareness available if you need it. bash awk '/^root/' /etc/passwd Lines starting with root . bash awk '!/^#/' config.txt Lines that do NOT start with # — strips comment lines. Comparison Operators Conditions can compare field values directly. bash awk -F':' '$3 == 0' /etc/passwd Lines where field 3 (UID) equals 0. In /etc/passwd , UID 0 is root. This shows every account with root-level UID — there should be only one. If there are more, that is worth investigating. bash awk -F':' '$3 >= 1000' /etc/passwd Lines where UID is 1000 or higher — regular user accounts on most Linux systems. bash awk -F':' '$3 == 0 && $1 != "root"' /etc/passwd UID 0 but the username is not root . Any result here is a hidden root-equivalent account. Comparison Operators Reference | Operator | Meaning | |----|----| | == | Equal to | | != | Not equal to | | > | Greater than | | < | Less than | | >= | Greater than or equal | | <= | Less than or equal | | && | AND — both conditions must be true | | ` | | | ! | NOT — negates a condition | Field Matching with ~ and !~ ~ matches a field against a regex. !~ does the inverse. bash awk '$7 ~ /bash/' /etc/passwd Lines where field 7 (the shell) contains bash . Shows interactive accounts. bash awk '$7 !~ /nologin/' /etc/passwd Lines where the shell does not contain nologin . Shows accounts that can actually log in. bash awk -F':' '$1 ~ /^admin/' /etc/passwd Lines where the username starts with admin . Part Three — Built-In Variables awk provides built-in variables that reflect the current state of processing as each line is read. | Variable | What It Contains | |----|----| | $0 | The entire current line | | $1 , $2 … | Individual fields | | NF | Number of fields on the current line | | NR | Current line number (across all input) | | FNR | Current line number within the current file | | FS | Field separator (set with -F or in BEGIN) | | OFS | Output field separator | | ORS | Output record separator (default: newline) | | FILENAME | Name of the current input file | NR — Line Numbers bash awk '{print NR, $0}' file.txt Prints the line number followed by the full line. A quick way to add line numbers to any output. bash awk 'NR==5' file.txt Prints only line 5. bash awk 'NR>=10 && NR<=20' file.txt Prints lines 10 through 20. The awk equivalent of sed -n '10,20p' . NF — Number of Fields bash awk 'NF > 5' file.txt Prints only lines that have more than 5 fields. Useful for filtering out malformed or incomplete lines. bash awk '{print NF, $0}' file.txt Prints the field count followed by the full line — useful for understanding the structure of unfamiliar output before writing a pipeline against it. Part Four — BEGIN and END BEGIN and END are special patterns that run outside the main input loop — they are not conditions in the usual sense and do not match input lines. BEGIN runs once before any input is processed. Use it to set variables, print headers, or configure separators. END runs once after all input has been processed. Use it to print totals, summaries, or final results. bash awk 'BEGIN{print "Starting..."} {print $1} END{print "Done."}' file.txt Counting With END bash awk 'END{print NR}' file.txt Prints the total number of records awk processed — closely equivalent to wc -l when the default record separator (newline) is in use. bash awk -F':' '$3 >= 1000 {count++} END{print count}' /etc/passwd Counts the number of regular user accounts (UID >= 1000) and prints the total at the end. count++ increments a counter variable by 1 for each matching line. awk initializes unset variables to 0 automatically, so no setup is needed. Summing Values bash awk '{sum += $3} END{print sum}' data.txt Adds up all values in field 3 across every line and prints the total. The += operator adds the current value to the running total. bash awk -F',' '{total += $2} END{printf "Total: %d\n", total}' sales.csv Sums a numeric column in a CSV file. Part Five — awk in Security Workflows Find All Accounts With UID 0 bash awk -F':' '$3 == 0 {print $1}' /etc/passwd Prints the username of every account with UID 0. On a clean system this should return only root . Any additional result is a root-equivalent backdoor account. List Interactive User Accounts bash awk -F':' '$3 >= 1000 && $7 !~ /nologin|false/ {print $1, $7}' /etc/passwd Users with UID >= 1000 whose shell is not nologin or false — accounts that can log in interactively. Prints the username and shell. Extract IPs and Counts From Access Logs bash awk '{print $1}' access.log | sort | uniq -c | sort -rn | head -20 Field 1 in Combined Log Format is the client IP. awk extracts it cleanly, handling variable spacing better than cut. sort + uniq -c + sort -rn ranks by frequency. head -20 limits to top 20. Filter Log Entries by HTTP Status Code bash awk '$9 == 403' access.log In Combined Log Format, field 9 is the HTTP status code. This prints every line where the status is 403 — every forbidden request. Verify the field position against a sample line if your log format differs. bash awk '$9 >= 500' access.log Every server error (5xx). Field comparison works on numeric values directly. Count Requests Per IP for a Specific Status bash awk '$9 == 403 {print $1}' access.log | sort | uniq -c | sort -rn Filters to 403 responses, extracts the client IP, and ranks by frequency. Shows you which IPs are hitting forbidden endpoints the most. Detect SSH Brute Force From Auth Logs bash awk '/Failed password/{print $(NF-3)}' /var/log/auth.log | sort | uniq -c | sort -rn Filters to failed password lines. $(NF-3) extracts the source IP — in standard SSH log format, the IP is the fourth field from the end. sort + uniq -c + sort -rn ranks by attempt count. Note: the field position of the IP in auth log lines can vary by system configuration. Verify against a sample line: bash grep "Failed password" /var/log/auth.log | head -n 1 Count from the end to confirm the IP field position before relying on $(NF-3) . Calculate Total Bytes Transferred bash awk '{sum += $10} END{printf "Total bytes: %d\n", sum}' access.log Field 10 in Combined Log Format is the response size in bytes — verify the field position against a sample line if your format differs. This sums the entire column and prints the total, useful for spotting unusually large data transfers in a log segment. Find Large Responses That May Indicate Exfiltration bash awk '$10 > 1000000 {print $1, $7, $10}' access.log Prints the IP, requested path (field 7 in Combined Log Format), and response size for any response larger than 1MB. Field positions are format-dependent — verify before relying on them. Responses that large to an external IP are worth investigating. Parse /etc/shadow for Accounts With No Password bash awk -F':' '$2 == "" {print $1}' /etc/shadow An empty field 2 in /etc/shadow indicates no password hash is set for that account. Whether the account can actually be accessed without a password depends on PAM configuration and account status — but an empty password field is always worth flagging during a review. Reformat Nmap Output bash grep "open" nmap.txt | awk '{print $1}' | cut -d'/' -f1 | sort -n awk extracts field 1 from each open port line (the port/protocol column). cut strips the protocol. sort -n orders numerically. Clean port list. Build a Username:UID Map bash awk -F':' '{printf "%-20s %s\n", $1, $3}' /etc/passwd Formats a left-aligned username column (20 chars wide) next to the UID. Readable reference table of every account and its numeric ID. Sum Failed Login Counts by User bash awk '/Failed password/{user=$(NF-5); count[user]++} END{for(u in count) printf "%d %s\n", count[u], u}' /var/log/auth.log | sort -rn Builds a counter array indexed by username. At the END, loops through the array and prints each username with its count. sort -rn ranks highest first. This introduces two awk concepts used here for the first time: Arrays : count[user]++ — awk arrays are associative (like dictionaries). Any string can be a key. for loop : for(u in count) — iterates over every key in the array. Part Six — awk and the Pipeline awk slots cleanly into the pipelines built with the rest of the tools in this series. grep → awk bash grep "Failed password" /var/log/auth.log | awk '{print $(NF-3)}' grep filters to relevant lines. awk extracts the field. Separating filtering from extraction keeps each step readable and easy to adjust. awk → sort → uniq bash awk -F':' '$3 >= 1000 {print $1}' /etc/passwd | sort | uniq awk filters and extracts. sort + uniq deduplicates. Each tool does one job. awk → sed bash awk '{print $1, $3}' data.txt | sed 's/ /,/' awk extracts two fields (space-separated by default). sed replaces the space with a comma. Result is a two-column CSV. Quick Reference Syntax bash awk 'condition { action }' file awk -F':' '{ print $1 }' file # custom field separator awk 'BEGIN{ } { } END{ }' file # with BEGIN and END blocks Built-In Variables | Variable | What It Is | |----|----| | $0 | Entire line | | $1 , $2 … | Field 1, field 2… | | $NF | Last field | | $(NF-1) | Second-to-last field | | NR | Current line number | | NF | Number of fields on current line | | FS | Field separator | | OFS | Output field separator | | FILENAME | Current filename | Conditions | Condition | What It Matches | |----|----| | /pattern/ | Lines matching pattern | | !/pattern/ | Lines not matching pattern | | $n == "x" | Field n equals x | | $n != "x" | Field n does not equal x | | $n > N | Field n greater than N | | $n ~ /pat/ | Field n matches regex | | $n !~ /pat/ | Field n does not match regex | | NR == n | Line number equals n | | NR >= n && NR <= m | Line range n to m | Common Actions | Action | What It Does | |----|----| | print $1, $2 | Print fields with space between | | printf "%s\n", $1 | Formatted print | | count++ | Increment a counter | | sum += $n | Add field n to a running total | | arr[$1]++ | Increment an array element | Closing awk is not the first tool you reach for. For simple field extraction, cut is faster to type. For pattern matching, grep is clearer. For character-level work, tr or sed is the right fit. awk is the tool you reach for when those are not enough — when you need to filter by field value, perform arithmetic, count occurrences, or combine several operations in a single pass without building a six-tool pipeline. The mental model is simple: every line is a row, every field is a column, and awk lets you write conditions and actions against that structure. Once that clicks, the rest follows naturally. \
View original source — Hacker Noon ↗


