awk
Linux Awk is more like a programming language than a command, developers and system administrators use on regular basis for pattern scanning, parsing logs, formatting output of programs, in shell scripting etc. Awk is abbreviated from the names of the developers – Aho Alfred, Weinberger Peter, and Kernighan Brian.
DEFINITION
gawk - pattern scanning and processing language
USAGE
gawk [ POSIX or GNU style options ] -f program-file [ -- ] file ...
gawk [ POSIX or GNU style options ] [ -- ] program-text file ...
DESCRIPTION
Gawk is the GNU Project's implementation of the AWK programming language. It conforms to the
definition of the language in the POSIX 1003.1 standard. This version in turn is based on
the description in The AWK Programming Language, by Aho, Kernighan, and Weinberger. Gawk
provides the additional features found in the current version of Brian Kernighan's awk and
numerous GNU-specific extensions.
The command line consists of options to gawk itself, the AWK program text (if not supplied
via the -f or --include options), and values to be made available in the ARGC and ARGV pre-
defined AWK variables.
When gawk is invoked with the --profile option, it starts gathering profiling statistics from
the execution of the program. Gawk runs more slowly in this mode, and automatically produces
an execution profile in the file awkprof.out when done. See the --profile option, below.
Gawk also has an integrated debugger. An interactive debugging session can be started by sup‐
plying the --debug option to the command line. In this mode of execution, gawk loads the AWK
source code and then prompts for debugging commands. Gawk can only debug AWK program source
provided with the -f and --include options. The debugger is documented in GAWK: Effective
AWK Programming.
OPTION FORMAT
Gawk options may be either traditional POSIX-style one letter options, or GNU-style long op‐
tions. POSIX options start with a single “-”, while long options start with “--”. Long op‐
tions are provided for both GNU-specific features and for POSIX-mandated features.
Gawk-specific options are typically used in long-option form. Arguments to long options are
either joined with the option by an = sign, with no intervening spaces, or they may be pro‐
vided in the next command line argument. Long options may be abbreviated, as long as the ab‐
breviation remains unique.
Additionally, every long option has a corresponding short option, so that the option's func‐
tionality may be used from within #! executable scripts.
OPTIONS
Gawk accepts the following options. Standard options are listed first, followed by options
for gawk extensions, listed alphabetically by short option.
-f program-file
--file program-file
Read the AWK program source from the file program-file, instead of from the first com‐
mand line argument. Multiple -f (or --file) options may be used. Files read with -f
are treated as if they begin with an implicit @namespace "awk" statement.
-F fs
--field-separator fs
Use fs for the input field separator (the value of the FS predefined variable).
-v var=val
--assign var=val
Assign the value val to the variable var, before execution of the program begins.
Such variable values are available to the BEGIN rule of an AWK program.
-b
--characters-as-bytes
Treat all input data as single-byte characters. In other words, don't pay any atten‐
tion to the locale information when attempting to process strings as multibyte charac‐
ters. The --posix option overrides this one.
-c
--traditional
Run in compatibility mode. In compatibility mode, gawk behaves identically to Brian
Kernighan's awk; none of the GNU-specific extensions are recognized. See GNU EXTEN‐
SIONS, below, for more information.
-C
--copyright
Print the short version of the GNU copyright information message on the standard out‐
put and exit successfully.
-d[file]
--dump-variables[=file]
Print a sorted list of global variables, their types and final values to file. If no
file is provided, gawk uses a file named awkvars.out in the current directory.
Having a list of all the global variables is a good way to look for typographical er‐
rors in your programs. You would also use this option if you have a large program
with a lot of functions, and you want to be sure that your functions don't inadver‐
tently use global variables that you meant to be local. (This is a particularly easy
mistake to make with simple variable names like i, j, and so on.)
-D[file]
--debug[=file]
Enable debugging of AWK programs. By default, the debugger reads commands interac‐
tively from the keyboard (standard input). The optional file argument specifies a
file with a list of commands for the debugger to execute non-interactively.
-e program-text
--source program-text
Use program-text as AWK program source code. This option allows the easy intermixing
of library functions (used via the -f and --include options) with source code entered
on the command line. It is intended primarily for medium to large AWK programs used
in shell scripts. Each argument supplied via -e is treated as if it begins with an
implicit @namespace "awk" statement.
-E file
--exec file
Similar to -f, however, this is option is the last one processed. This should be used
with #! scripts, particularly for CGI applications, to avoid passing in options or
source code (!) on the command line from a URL. This option disables command-line
variable assignments.
-g
--gen-pot
Scan and parse the AWK program, and generate a GNU .pot (Portable Object Template)
format file on standard output with entries for all localizable strings in the pro‐
gram. The program itself is not executed. See the GNU gettext distribution for more
information on .pot files.
-h
--help Print a relatively short summary of the available options on the standard output.
(Per the GNU Coding Standards, these options cause an immediate, successful exit.)
-i include-file
--include include-file
Load an awk source library. This searches for the library using the AWKPATH environ‐
ment variable. If the initial search fails, another attempt will be made after ap‐
pending the .awk suffix. The file will be loaded only once (i.e., duplicates are
eliminated), and the code does not constitute the main program source. Files read
with --include are treated as if they begin with an implicit @namespace "awk" state‐
ment.
-l lib
--load lib
Load a gawk extension from the shared library lib. This searches for the library us‐
ing the AWKLIBPATH environment variable. If the initial search fails, another attempt
will be made after appending the default shared library suffix for the platform. The
library initialization routine is expected to be named dl_load().
-L [value]
--lint[=value]
Provide warnings about constructs that are dubious or non-portable to other AWK imple‐
mentations. With an optional argument of fatal, lint warnings become fatal errors.
This may be drastic, but its use will certainly encourage the development of cleaner
AWK programs. With an optional argument of invalid, only warnings about things that
are actually invalid are issued. (This is not fully implemented yet.) With an op‐
tional argument of no-ext, warnings about gawk extensions are disabled.
-M
--bignum
Force arbitrary precision arithmetic on numbers. This option has no effect if gawk is
not compiled to use the GNU MPFR and GMP libraries. (In such a case, gawk issues a
warning.)
-n
--non-decimal-data
Recognize octal and hexadecimal values in input data. Use this option with great cau‐
tion!
-N
--use-lc-numeric
Force gawk to use the locale's decimal point character when parsing input data. Al‐
though the POSIX standard requires this behavior, and gawk does so when --posix is in
effect, the default is to follow traditional behavior and use a period as the decimal
point, even in locales where the period is not the decimal point character. This op‐
tion overrides the default behavior, without the full draconian strictness of the
--posix option.
-o[file]
--pretty-print[=file]
Output a pretty printed version of the program to file. If no file is provided, gawk
uses a file named awkprof.out in the current directory. This option implies --no-op‐
timize.
-O
--optimize
Enable gawk's default optimizations upon the internal representation of the program.
Currently, this just includes simple constant folding. This option is on by default.
-p[prof-file]
--profile[=prof-file]
Start a profiling session, and send the profiling data to prof-file. The default is
awkprof.out. The profile contains execution counts of each statement in the program
in the left margin and function call counts for each user-defined function. This op‐
tion implies --no-optimize.
-P
--posix
This turns on compatibility mode, with the following additional restrictions:
• \x escape sequences are not recognized.
• You cannot continue lines after ? and :.
• The synonym func for the keyword function is not recognized.
• The operators ** and **= cannot be used in place of ^ and ^=.
-r
--re-interval
Enable the use of interval expressions in regular expression matching (see Regular Ex‐
pressions, below). Interval expressions were not traditionally available in the AWK
language. The POSIX standard added them, to make awk and egrep consistent with each
other. They are enabled by default, but this option remains for use together with
--traditional.
-s
--no-optimize
Disable gawk's default optimizations upon the internal representation of the program.
-S
--sandbox
Run gawk in sandbox mode, disabling the system() function, input redirection with get‐
line, output redirection with print and printf, and loading dynamic extensions. Com‐
mand execution (through pipelines) is also disabled. This effectively blocks a script
from accessing local resources, except for the files specified on the command line.
-t
--lint-old
Provide warnings about constructs that are not portable to the original version of
UNIX awk.
-V
--version
Print version information for this particular copy of gawk on the standard output.
This is useful mainly for knowing if the current copy of gawk on your system is up to
date with respect to whatever the Free Software Foundation is distributing. This is
also useful when reporting bugs. (Per the GNU Coding Standards, these options cause
an immediate, successful exit.)
-- Signal the end of options. This is useful to allow further arguments to the AWK pro‐
gram itself to start with a “-”. This provides consistency with the argument parsing
convention used by most other POSIX programs.
In compatibility mode, any other options are flagged as invalid, but are otherwise ignored.
In normal operation, as long as program text has been supplied, unknown options are passed on
to the AWK program in the ARGV array for processing. This is particularly useful for running
AWK programs via the #! executable interpreter mechanism.
For POSIX compatibility, the -W option may be used, followed by the name of a long option.
AWK PROGRAM EXECUTION
An AWK program consists of a sequence of optional directives, pattern-action statements, and
optional function definitions.
@include "filename"
@load "filename"
@namespace "name"
pattern { action statements }
function name(parameter list) { statements }
Gawk first reads the program source from the program-file(s) if specified, from arguments to
--source, or from the first non-option argument on the command line. The -f and --source op‐
tions may be used multiple times on the command line. Gawk reads the program text as if all
the program-files and command line source texts had been concatenated together. This is use‐
ful for building libraries of AWK functions, without having to include them in each new AWK
program that uses them. It also provides the ability to mix library functions with command
line programs.
In addition, lines beginning with @include may be used to include other source files into
your program, making library use even easier. This is equivalent to using the --include op‐
tion.
Lines beginning with @load may be used to load extension functions into your program. This
is equivalent to using the --load option.
The environment variable AWKPATH specifies a search path to use when finding source files
named with the -f and --include options. If this variable does not exist, the default path
is ".:/usr/local/share/awk". (The actual directory may vary, depending upon how gawk was
built and installed.) If a file name given to the -f option contains a “/” character, no
path search is performed.
The environment variable AWKLIBPATH specifies a search path to use when finding source files
named with the --load option. If this variable does not exist, the default path is "/usr/lo‐
cal/lib/gawk". (The actual directory may vary, depending upon how gawk was built and in‐
stalled.)
Gawk executes AWK programs in the following order. First, all variable assignments specified
via the -v option are performed. Next, gawk compiles the program into an internal form.
Then, gawk executes the code in the BEGIN rule(s) (if any), and then proceeds to read each
file named in the ARGV array (up to ARGV[ARGC-1]). If there are no files named on the com‐
mand line, gawk reads the standard input.
If a filename on the command line has the form var=val it is treated as a variable assign‐
ment. The variable var will be assigned the value val. (This happens after any BEGIN
rule(s) have been run.) Command line variable assignment is most useful for dynamically as‐
signing values to the variables AWK uses to control how input is broken into fields and
records. It is also useful for controlling state if multiple passes are needed over a single
data file.
If the value of a particular element of ARGV is empty (""), gawk skips over it.
For each input file, if a BEGINFILE rule exists, gawk executes the associated code before
processing the contents of the file. Similarly, gawk executes the code associated with END‐
FILE after processing the file.
For each record in the input, gawk tests to see if it matches any pattern in the AWK program.
For each pattern that the record matches, gawk executes the associated action. The patterns
are tested in the order they occur in the program.
Finally, after all the input is exhausted, gawk executes the code in the END rule(s) (if
any).
Command Line Directories
According to POSIX, files named on the awk command line must be text files. The behavior is
``undefined'' if they are not. Most versions of awk treat a directory on the command line as
a fatal error.
Starting with version 4.0 of gawk, a directory on the command line produces a warning, but is
otherwise skipped. If either of the --posix or --traditional options is given, then gawk re‐
verts to treating directories on the command line as a fatal error.
AWK command accepts regular expressions, an extended kind found in egrep. It also accepts printf statement.
NUMERIC FUNCTIONS
AWK has the following built-in arithmetic functions:
atan2(y, x) Return the arctangent of y/x in radians.
cos(expr) Return the cosine of expr, which is in radians.
exp(expr) The exponential function.
int(expr) Truncate to integer.
log(expr) The natural logarithm function.
rand() Return a random number N, between zero and one, such that 0 ≤ N < 1.
sin(expr) Return the sine of expr, which is in radians.
sqrt(expr) Return the square root of expr.
srand([expr]) Use expr as the new seed for the random number generator. If no expr is pro‐
vided, use the time of day. Return the previous seed for the random number
generator.
EXAMPLE
root@letusstudy:/var/log/cups# ls -lrt
-rw-r----- 1 root adm 5332 Nov 7 00:37 error_log
-rw-r----- 1 root adm 25119 Nov 7 15:06 access_log
root@letusstudy:/var/log/cups# ls -lrt | awk '{print $6,$7,$9}'
Nov 7 error_log
Nov 7 access_log
Another example is to output only few columns in the access_log
, we use NR==10{exit}
to output first 10 lines.
root@letusstudy:/var/log/cups# ls -lrt
total 40
-rw-r----- 1 root adm 5332 Nov 7 00:37 error_log
-rw-r----- 1 root adm 25119 Nov 7 15:06 access_log
root@letusstudy:/var/log/cups# head access_log
localhost - - [05/Nov/2020:21:31:56 -0800] "POST / HTTP/1.1" 200 349 Create-Printer-Subscriptions successful-ok
localhost - - [05/Nov/2020:21:31:56 -0800] "POST / HTTP/1.1" 200 176 Create-Printer-Subscriptions successful-ok
localhost - - [05/Nov/2020:21:32:16 -0800] "POST / HTTP/1.1" 200 5195207 CUPS-Get-PPDs -
localhost - - [05/Nov/2020:21:32:22 -0800] "POST / HTTP/1.1" 200 5195207 CUPS-Get-PPDs -
localhost - - [05/Nov/2020:21:36:36 -0800] "POST / HTTP/1.1" 401 123 Cancel-Subscription successful-ok
localhost - root [05/Nov/2020:21:36:36 -0800] "POST / HTTP/1.1" 200 123 Cancel-Subscription successful-ok
localhost - - [05/Nov/2020:21:36:36 -0800] "POST / HTTP/1.1" 200 152 Cancel-Subscription successful-ok
localhost - - [05/Nov/2020:21:36:45 -0800] "POST / HTTP/1.1" 200 349 Create-Printer-Subscriptions successful-ok
localhost - - [05/Nov/2020:21:36:45 -0800] "POST / HTTP/1.1" 200 176 Create-Printer-Subscriptions successful-ok
localhost - - [05/Nov/2020:21:36:50 -0800] "POST / HTTP/1.1" 200 359 Create-Printer-Subscriptions successful-ok
root@letusstudy:/var/log/cups# awk '{print $4, $11, $12} NR==10{exit}' access_log
[05/Nov/2020:21:31:56 Create-Printer-Subscriptions successful-ok
[05/Nov/2020:21:31:56 Create-Printer-Subscriptions successful-ok
[05/Nov/2020:21:32:16 CUPS-Get-PPDs -
[05/Nov/2020:21:32:22 CUPS-Get-PPDs -
[05/Nov/2020:21:36:36 Cancel-Subscription successful-ok
[05/Nov/2020:21:36:36 Cancel-Subscription successful-ok
[05/Nov/2020:21:36:36 Cancel-Subscription successful-ok
[05/Nov/2020:21:36:45 Create-Printer-Subscriptions successful-ok
[05/Nov/2020:21:36:45 Create-Printer-Subscriptions successful-ok
[05/Nov/2020:21:36:50 Create-Printer-Subscriptions successful-ok
No Comments