awk

Linux Awk is more like a programming language than a command, developers and system administrators use on regular basis for pattern scanning, parsing logs, formatting output of programs,  in shell scripting etc.  Awk is abbreviated from the names of the developers – Aho Alfred, Weinberger Peter, and Kernighan Brian. 

DEFINITION
gawk - pattern scanning and processing language
USAGE
gawk [ POSIX or GNU style options ] -f program-file [ -- ] file ...
gawk [ POSIX or GNU style options ] [ -- ] program-text file ...
DESCRIPTION
Gawk is the GNU Project's implementation of the AWK programming language.  It conforms to the
       definition of the language in the POSIX 1003.1 standard.  This version in turn  is  based  on
       the  description  in  The  AWK Programming Language, by Aho, Kernighan, and Weinberger.  Gawk
       provides the additional features found in the current version of Brian  Kernighan's  awk  and
       numerous GNU-specific extensions.

       The  command  line  consists of options to gawk itself, the AWK program text (if not supplied
       via the -f or --include options), and values to be made available in the ARGC and  ARGV  pre-
       defined AWK variables.

       When gawk is invoked with the --profile option, it starts gathering profiling statistics from
       the execution of the program.  Gawk runs more slowly in this mode, and automatically produces
       an execution profile in the file awkprof.out when done.  See the --profile option, below.

       Gawk also has an integrated debugger. An interactive debugging session can be started by sup‐
       plying the --debug option to the command line. In this mode of execution, gawk loads the  AWK
       source  code and then prompts for debugging commands.  Gawk can only debug AWK program source
       provided with the -f and --include options.  The debugger is documented  in  GAWK:  Effective
       AWK Programming.

OPTION FORMAT

      Gawk  options may be either traditional POSIX-style one letter options, or GNU-style long op‐
       tions.  POSIX options start with a single “-”, while long options start with “--”.  Long  op‐
       tions are provided for both GNU-specific features and for POSIX-mandated features.

       Gawk-specific  options are typically used in long-option form.  Arguments to long options are
       either joined with the option by an = sign, with no intervening spaces, or they may  be  pro‐
       vided in the next command line argument.  Long options may be abbreviated, as long as the ab‐
       breviation remains unique.

       Additionally, every long option has a corresponding short option, so that the option's  func‐
       tionality may be used from within #!  executable scripts.
OPTIONS
       Gawk  accepts  the following options.  Standard options are listed first, followed by options
       for gawk extensions, listed alphabetically by short option.

       -f program-file
       --file program-file
              Read the AWK program source from the file program-file, instead of from the first com‐
              mand  line argument.  Multiple -f (or --file) options may be used.  Files read with -f
              are treated as if they begin with an implicit @namespace "awk" statement.

       -F fs
       --field-separator fs
              Use fs for the input field separator (the value of the FS predefined variable).

       -v var=val
       --assign var=val
              Assign the value val to the variable var, before  execution  of  the  program  begins.
              Such variable values are available to the BEGIN rule of an AWK program.

       -b
       --characters-as-bytes
              Treat  all  input data as single-byte characters. In other words, don't pay any atten‐
              tion to the locale information when attempting to process strings as multibyte charac‐
              ters.  The --posix option overrides this one.

       -c
       --traditional
              Run  in  compatibility mode.  In compatibility mode, gawk behaves identically to Brian
              Kernighan's awk; none of the GNU-specific extensions are recognized.  See  GNU  EXTEN‐
              SIONS, below, for more information.

       -C
       --copyright
              Print  the short version of the GNU copyright information message on the standard out‐
              put and exit successfully.

      -d[file]
       --dump-variables[=file]
              Print a sorted list of global variables, their types and final values to file.  If  no
              file is provided, gawk uses a file named awkvars.out in the current directory.
              Having  a list of all the global variables is a good way to look for typographical er‐
              rors in your programs.  You would also use this option if you  have  a  large  program
              with  a  lot  of functions, and you want to be sure that your functions don't inadver‐
              tently use global variables that you meant to be local.  (This is a particularly  easy
              mistake to make with simple variable names like i, j, and so on.)

       -D[file]
       --debug[=file]
              Enable  debugging  of  AWK programs.  By default, the debugger reads commands interac‐
              tively from the keyboard (standard input).  The optional  file  argument  specifies  a
              file with a list of commands for the debugger to execute non-interactively.

       -e program-text
       --source program-text
              Use  program-text as AWK program source code.  This option allows the easy intermixing
              of library functions (used via the -f and --include options) with source code  entered
              on  the  command line.  It is intended primarily for medium to large AWK programs used
              in shell scripts.  Each argument supplied via -e is treated as if it  begins  with  an
              implicit @namespace "awk" statement.

       -E file
       --exec file
              Similar to -f, however, this is option is the last one processed.  This should be used
              with #!  scripts, particularly for CGI applications, to avoid passing  in  options  or
              source  code  (!)  on  the command line from a URL.  This option disables command-line
              variable assignments.

       -g
       --gen-pot
              Scan and parse the AWK program, and generate a GNU  .pot  (Portable  Object  Template)
              format  file  on  standard output with entries for all localizable strings in the pro‐
              gram.  The program itself is not executed.  See the GNU gettext distribution for  more
              information on .pot files.

       -h
       --help Print  a  relatively  short  summary  of the available options on the standard output.
              (Per the GNU Coding Standards, these options cause an immediate, successful exit.)

       -i include-file
       --include include-file
              Load an awk source library.  This searches for the library using the AWKPATH  environ‐
              ment  variable.   If  the initial search fails, another attempt will be made after ap‐
              pending the .awk suffix.  The file will be loaded  only  once  (i.e.,  duplicates  are
              eliminated),  and  the  code  does not constitute the main program source.  Files read
              with --include are treated as if they begin with an implicit @namespace  "awk"  state‐
              ment.

       -l lib
       --load lib
              Load  a gawk extension from the shared library lib.  This searches for the library us‐
              ing the AWKLIBPATH environment variable.  If the initial search fails, another attempt
              will  be made after appending the default shared library suffix for the platform.  The
              library initialization routine is expected to be named dl_load().

       -L [value]
       --lint[=value]
              Provide warnings about constructs that are dubious or non-portable to other AWK imple‐
              mentations.   With  an  optional argument of fatal, lint warnings become fatal errors.
              This may be drastic, but its use will certainly encourage the development  of  cleaner
              AWK  programs.   With an optional argument of invalid, only warnings about things that
              are actually invalid are issued. (This is not fully implemented  yet.)   With  an  op‐
              tional argument of no-ext, warnings about gawk extensions are disabled.

       -M
       --bignum
              Force  arbitrary precision arithmetic on numbers. This option has no effect if gawk is
              not compiled to use the GNU MPFR and GMP libraries.  (In such a case,  gawk  issues  a
              warning.)

       -n
       --non-decimal-data
              Recognize octal and hexadecimal values in input data.  Use this option with great cau‐
              tion!

       -N
       --use-lc-numeric
              Force gawk to use the locale's decimal point character when parsing input  data.   Al‐
              though  the POSIX standard requires this behavior, and gawk does so when --posix is in
              effect, the default is to follow traditional behavior and use a period as the  decimal
              point,  even in locales where the period is not the decimal point character.  This op‐
              tion overrides the default behavior, without the  full  draconian  strictness  of  the
              --posix option.

       -o[file]
       --pretty-print[=file]
              Output  a pretty printed version of the program to file.  If no file is provided, gawk
              uses a file named awkprof.out in the current directory.  This option implies  --no-op‐
              timize.

       -O
       --optimize
              Enable  gawk's  default optimizations upon the internal representation of the program.
              Currently, this just includes simple constant folding.  This option is on by default.

       -p[prof-file]
       --profile[=prof-file]
              Start a profiling session, and send the profiling data to prof-file.  The  default  is
              awkprof.out.   The  profile contains execution counts of each statement in the program
              in the left margin and function call counts for each user-defined function.  This  op‐
              tion implies --no-optimize.

      -P
       --posix
              This turns on compatibility mode, with the following additional restrictions:

              • \x escape sequences are not recognized.

              • You cannot continue lines after ?  and :.

              • The synonym func for the keyword function is not recognized.

              • The operators ** and **= cannot be used in place of ^ and ^=.

       -r
       --re-interval
              Enable the use of interval expressions in regular expression matching (see Regular Ex‐
              pressions, below).  Interval expressions were not traditionally available in  the  AWK
              language.   The  POSIX standard added them, to make awk and egrep consistent with each
              other.  They are enabled by default, but this option remains  for  use  together  with
              --traditional.

       -s
       --no-optimize
              Disable gawk's default optimizations upon the internal representation of the program.

       -S
       --sandbox
              Run gawk in sandbox mode, disabling the system() function, input redirection with get‐
              line, output redirection with print and printf, and loading dynamic extensions.   Com‐
              mand execution (through pipelines) is also disabled.  This effectively blocks a script
              from accessing local resources, except for the files specified on the command line.

       -t
       --lint-old
              Provide warnings about constructs that are not portable to  the  original  version  of
              UNIX awk.

       -V
       --version
              Print  version  information  for  this particular copy of gawk on the standard output.
              This is useful mainly for knowing if the current copy of gawk on your system is up  to
              date  with  respect to whatever the Free Software Foundation is distributing.  This is
              also useful when reporting bugs.  (Per the GNU Coding Standards, these  options  cause
              an immediate, successful exit.)

       --     Signal  the  end of options. This is useful to allow further arguments to the AWK pro‐
              gram itself to start with a “-”.  This provides consistency with the argument  parsing
              convention used by most other POSIX programs.

       In  compatibility  mode, any other options are flagged as invalid, but are otherwise ignored.
       In normal operation, as long as program text has been supplied, unknown options are passed on
       to the AWK program in the ARGV array for processing.  This is particularly useful for running
       AWK programs via the #!  executable interpreter mechanism.

       For POSIX compatibility, the -W option may be used, followed by the name of a long option.
AWK PROGRAM EXECUTION
An AWK program consists of a sequence of optional directives, pattern-action statements,  and
       optional function definitions.

              @include "filename"
              @load "filename"
              @namespace "name"
              pattern   { action statements }
              function name(parameter list) { statements }

       Gawk  first reads the program source from the program-file(s) if specified, from arguments to
       --source, or from the first non-option argument on the command line.  The -f and --source op‐
       tions  may be used multiple times on the command line.  Gawk reads the program text as if all
       the program-files and command line source texts had been concatenated together.  This is use‐
       ful  for  building libraries of AWK functions, without having to include them in each new AWK
       program that uses them.  It also provides the ability to mix library functions  with  command
       line programs.

       In  addition,  lines  beginning  with @include may be used to include other source files into
       your program, making library use even easier.  This is equivalent to using the --include  op‐
       tion.

       Lines  beginning  with @load may be used to load extension functions into your program.  This
       is equivalent to using the --load option.

       The environment variable AWKPATH specifies a search path to use  when  finding  source  files
       named  with  the -f and --include options.  If this variable does not exist, the default path
       is ".:/usr/local/share/awk".  (The actual directory may vary, depending  upon  how  gawk  was
       built  and  installed.)   If  a file name given to the -f option contains a “/” character, no
       path search is performed.

       The environment variable AWKLIBPATH specifies a search path to use when finding source  files
       named with the --load option.  If this variable does not exist, the default path is "/usr/lo‐
       cal/lib/gawk".  (The actual directory may vary, depending upon how gawk  was  built  and  in‐
       stalled.)

       Gawk executes AWK programs in the following order.  First, all variable assignments specified
       via the -v option are performed.  Next, gawk compiles the  program  into  an  internal  form.
       Then,  gawk  executes  the code in the BEGIN rule(s) (if any), and then proceeds to read each
       file named in the ARGV array (up to ARGV[ARGC-1]).  If there are no files named on  the  com‐
       mand line, gawk reads the standard input.

       If  a  filename  on the command line has the form var=val it is treated as a variable assign‐
       ment.  The variable var will be assigned the  value  val.   (This  happens  after  any  BEGIN
       rule(s)  have been run.)  Command line variable assignment is most useful for dynamically as‐
       signing values to the variables AWK uses to control how  input  is  broken  into  fields  and
       records.  It is also useful for controlling state if multiple passes are needed over a single
       data file.

       If the value of a particular element of ARGV is empty (""), gawk skips over it.

       For each input file, if a BEGINFILE rule exists, gawk executes  the  associated  code  before
       processing  the  contents of the file. Similarly, gawk executes the code associated with END‐
       FILE after processing the file.

       For each record in the input, gawk tests to see if it matches any pattern in the AWK program.
       For  each pattern that the record matches, gawk executes the associated action.  The patterns
       are tested in the order they occur in the program.

       Finally, after all the input is exhausted, gawk executes the code  in  the  END  rule(s)  (if
       any).

  Command Line Directories
       According  to POSIX, files named on the awk command line must be text files.  The behavior is
       ``undefined'' if they are not.  Most versions of awk treat a directory on the command line as
       a fatal error.

       Starting with version 4.0 of gawk, a directory on the command line produces a warning, but is
       otherwise skipped.  If either of the --posix or --traditional options is given, then gawk re‐
       verts to treating directories on the command line as a fatal error.

AWK command accepts regular expressions, an extended kind found in egrep. It also accepts printf statement.

NUMERIC FUNCTIONS
 
       AWK has the following built-in arithmetic functions:

       atan2(y, x)   Return the arctangent of y/x in radians.

       cos(expr)     Return the cosine of expr, which is in radians.

       exp(expr)     The exponential function.

       int(expr)     Truncate to integer.

       log(expr)     The natural logarithm function.

       rand()        Return a random number N, between zero and one, such that 0 ≤ N < 1.

       sin(expr)     Return the sine of expr, which is in radians.

       sqrt(expr)    Return the square root of expr.

       srand([expr]) Use expr as the new seed for the random number generator.  If no expr  is  pro‐
                     vided,  use  the  time  of day.  Return the previous seed for the random number
                     generator.
EXAMPLE
root@letusstudy:/var/log/cups# ls -lrt
-rw-r----- 1 root adm  5332 Nov  7 00:37 error_log
-rw-r----- 1 root adm 25119 Nov  7 15:06 access_log

root@letusstudy:/var/log/cups# ls -lrt | awk '{print $6,$7,$9}'
Nov 7 error_log
Nov 7 access_log

Another example is to output only few columns in the access_log, we use NR==10{exit} to output first 10 lines. 

root@letusstudy:/var/log/cups# ls -lrt
total 40
-rw-r----- 1 root adm  5332 Nov  7 00:37 error_log
-rw-r----- 1 root adm 25119 Nov  7 15:06 access_log

root@letusstudy:/var/log/cups# head access_log 
localhost - - [05/Nov/2020:21:31:56 -0800] "POST / HTTP/1.1" 200 349 Create-Printer-Subscriptions successful-ok
localhost - - [05/Nov/2020:21:31:56 -0800] "POST / HTTP/1.1" 200 176 Create-Printer-Subscriptions successful-ok
localhost - - [05/Nov/2020:21:32:16 -0800] "POST / HTTP/1.1" 200 5195207 CUPS-Get-PPDs -
localhost - - [05/Nov/2020:21:32:22 -0800] "POST / HTTP/1.1" 200 5195207 CUPS-Get-PPDs -
localhost - - [05/Nov/2020:21:36:36 -0800] "POST / HTTP/1.1" 401 123 Cancel-Subscription successful-ok
localhost - root [05/Nov/2020:21:36:36 -0800] "POST / HTTP/1.1" 200 123 Cancel-Subscription successful-ok
localhost - - [05/Nov/2020:21:36:36 -0800] "POST / HTTP/1.1" 200 152 Cancel-Subscription successful-ok
localhost - - [05/Nov/2020:21:36:45 -0800] "POST / HTTP/1.1" 200 349 Create-Printer-Subscriptions successful-ok
localhost - - [05/Nov/2020:21:36:45 -0800] "POST / HTTP/1.1" 200 176 Create-Printer-Subscriptions successful-ok
localhost - - [05/Nov/2020:21:36:50 -0800] "POST / HTTP/1.1" 200 359 Create-Printer-Subscriptions successful-ok

root@letusstudy:/var/log/cups# awk '{print $4, $11, $12} NR==10{exit}' access_log
[05/Nov/2020:21:31:56 Create-Printer-Subscriptions successful-ok
[05/Nov/2020:21:31:56 Create-Printer-Subscriptions successful-ok
[05/Nov/2020:21:32:16 CUPS-Get-PPDs -
[05/Nov/2020:21:32:22 CUPS-Get-PPDs -
[05/Nov/2020:21:36:36 Cancel-Subscription successful-ok
[05/Nov/2020:21:36:36 Cancel-Subscription successful-ok
[05/Nov/2020:21:36:36 Cancel-Subscription successful-ok
[05/Nov/2020:21:36:45 Create-Printer-Subscriptions successful-ok
[05/Nov/2020:21:36:45 Create-Printer-Subscriptions successful-ok
[05/Nov/2020:21:36:50 Create-Printer-Subscriptions successful-ok

Search Results