[colug-432] Printing Next to Last Field

Rick Hornsby richardjhornsby at gmail.com
Wed Oct 14 19:19:17 EDT 2015




> On Oct 14, 2015, at 17:25, jep200404 at columbus.rr.com wrote:
> 
> On Wed, 14 Oct 2015 17:59:33 -0400, jep200404 at columbus.rr.com wrote:
> 
>> Compare the two following filter chains for printing the last field on a line.
>> Which one would you choose? Why?
>> 
>>    ... | rev | awk '{print $1}' | rev
>>    ... | awk '{print $NF}'
> 
> Similarly for next to last field:
> 
>    ... | rev | awk '{print $2}' | rev
>    ... | awk '{print $(NF-1)}'
> 
>> What are the interesting edge cases?
> 
>    [jep at main ~]$ cat fo
>    three tres tree
>    to two
>    one
> 
>    [jep at main ~]$ cat fo | rev | awk '{print $1}' | rev
>    tree
>    two
>    one
> 
>    [jep at main ~]$ cat fo | awk '{print $NF}'
>    tree
>    two
>    one
> 
>    [jep at main ~]$ cat fo | rev | awk '{print $2}' | rev
>    tres
>    to
> 
> 
>    [jep at main ~]$ cat fo | awk '{print $(NF-1)}'
>    tres
>    to
>    one
>    awk: cmd. line:1: (FILENAME=- FNR=4) fatal: attempt to access field -1
>    [jep at main ~]$ 
> 
> No difference for the last field, 
> but differences for the next to last field.
> "cat fo | rev | awk '{print $2}' | rev" prints nothing,
> when "cat fo | awk '{print $(NF-1)}'" prints "one".
> "cat fo | rev | awk '{print $2}' | rev" prints nothing,
> when "cat fo | awk '{print $(NF-1)}'" complains.


You raise an interesting question.

I suppose the correct answer depends on what you define as correct behavior.  To put it another way, how strict are you (ala perl's use strict;)?  The most strict interpretation is awk complains.  You're violating the index > 0 assertion (awk index being 1-based), so a complaint from awk is warranted because NF-1 < 0.

The question becomes is your data file corrupt?  Normally when doing field separation the data file is expected to exhibit consistency - the same, or at least a minimum, number of fields on every line.  If consistency is not present or expected, then that edge case must be handled.  Do you fail?  Do you assume the value is empty and continue on?

IMHO, this is a condition where I've moved beyond a one-liner.  I'm most likely going to wrap the awk in a small shell script*, and explicitly try to handle the "not enough fields" case - probably by checking the exit code $? for awk.

I think the reason I go down that road is that to use rev is to effectively bury the problem of an inconsistent data format - both from yourself and from your readers.  I may decide that it really is okay to have a missing field, that the result should be the empty string.  I think I want to be explicit in my small shell script and indicate to the reader that this (use empty string) is a conscious and intentional choice.

-rj


* There are ways to make awk eat the problem on a single line without rev, but it starts to get unreadable - which can quickly spiral out of control.


More information about the colug-432 mailing list