[colug-432] Rant About sed, Regex and the Web

Steve Roggenkamp steveroggenkamp at pobox.com
Sun May 17 13:45:17 EDT 2015


Good rant!

As someone your age who switched from aerospace engineering to computers
in the early 80s, I feel your pain.  Big time.

I started using *NIX in the mid-80s, starting with the old Sun-2
workstations.  I remember thinking that vi seemed to be a major
throwback compared to the edt editor I had been using on the DEC VAX/VMS
system.  But then, I didn't have a choice if I was going to work on a
*NIX machine.

I quickly learned that there are several fundamental concepts that you
need to understand to be successful with *NIX:  shell scripting, regular
expressions, and find.

There are two major lines of development for regular expressions, the
ones that originated out of Bell Labs, and the extensions to them coming
from the Perl language which I think formed much of the basis for the
POSIX regular expression library.  Sed came from Bell Labs, so it
follows the BL conventions.

In any event, the definitive documentation on regular expressions should
be the man pages, although they can be very dry and difficult to
decypher.  Much technical documentation is written using many levels of
processing, so the author may have had the correct syntax in their
original writings, but through the multi-level processing to go from the
text to the presentation, something was missed by the proofreaders (if any).

In a bigger sense, though, this is just the par for the course.  Much of
what we use in the open source world is written by volunteers or
sponsored by corporations that need to obtain revenue from their
efforts.  Documentation is seen as a "cost-center" whose magnitude is
meant to be minimized in corporate revenue optimization efforts.  Thus,
what we save in software licensing fees, we may spend with our time
desperately searching the web for nuggets that tell us why a reasonable
(to us) command does not do what we think it should, or even HOW to do
something in the first place.

Of course I could take the conspiracy theory approach and say it's a
revenue optimization effort by Google, Yahoo and Microsoft to maximize
the number of searches per line of code.  Before Google came on line,
everything was well documented, organized, and indexed, so one could
find answers very quickly.  I worked for a while on an IBM mainframe
many years ago, and any time I had to look up an error code, I could
quickly go over to the three-foot long printed manual on the table in
the office, flip open to the section 2 ft, 3 1/2 inches from the left
side, and look up that error number.  It had a description of the error
and what to do about it.  Of course the reason it was so well organized
is that one needed it because the mainframe was such a pain to work on.

But I digress.

Documentation has always been the bane of developers.  Very few of us
are good at both coding and documenting our work.  With the recent trend
to agile development, the attitude seems to emphasize coding even more
to the point where it's generated automagically (through probably even
more layers than before) from our source code and the few comments we
sprinkle amongst the meat of our code.

I do feel your pain and have felt it for years, many years.

Good luck,
Steve

On 05/17/2015 02:11 PM, Steve VanSlyck wrote:
> Rant follows. If you're not up for some immaturity on a Sunday morning,
> it's OK, please press delete. Before that, once again, please accept my
> appreciation for the tons of help you guys are giving me. I trust some
> of you will relate.
>
> Jim, who is always here, always helpful, asked me some good questions
> about my one-liner,
>
> sed -i.bak 's:(^Defaults\s*Env_reset\s*)$:$1\nDefaults
> editor=/bin/nano:' ./sudoers
>
> , one of which was why "$1?"
>
> For the "replace" portion of sed, I was trying to find out how to return
> the result of the last search and $1 seemed to be the right way to do
> it. I saw what I thought was a authoritative page on this but cannot
> find it now. One that I did find is
> http://stackoverflow.com/questions/2890700/backreferences-syntax-in-replacement-strings-why-dollar-sign
> .
>
> Everything I'm readying about regular expressions talks about this
> engine does it this way and that engine does it that way and it's all
> very confusing and frustrating. None of the "engines" are
> fedora-based bash, so I have no idea which engine's requirements I'm
> supposed to go by.
>
> On top of this I have to struggle with a problem endemic to technical
> writing and which is far too common: writers who state a rule and then
> give examples inconsistent with the rule just described!
>
> This makes me scream. I've seen it in so-called textbooks and
> well-regarded non text-books on mathematics such as 1089 and All That.
> It usually results in uncharitable thoughts about the author followed by
> some profanity and an addition to the trash can. On grymoire.com, for
> example, Bruce Barnett writes, "If you need to match a "^" at the
> beginning of the line, or a "$" at the end of a line, you must escape
> the special character with a backslash." Hard rule, got it, I'm good,
> let's read on...and then he follows it up with THESE examples!:
>
> ^A matches "A" at the beginning of a line A$ matches "A" at the end of
> a line A^ matches "A^" anywhere on a line $A matches "$A" anywhere on a
> line ^^ matches "^" at the beginning of a line $$ matches "$" at the
> end of a line
>
> WT#! It maybe would've killed him to mention escaping the character OR
> doubling it, and then give BOTH examples?:
>
> ^^ or \^ matches "^" at the beginning of a line $$ or \$ matches "$" at
> the end of a line
>
> How are we supposed to learn anything if we can't trust our teachers!
>
> I'm almost 60. I approach this stuff in good faith, with no math
> training beyond high school, no programming experience beyond Excel, a
> little BASIC as a teenager* in the 70's, and an introduction to Visual
> Basic course at DeVry in 2003 (where I got high marks for both coding
> and commenting), and every time I try to learn something I run into
> walls because the teachers won't do in English what they do in code!
>
> Anyway, thank you for listening. I'm not going to give up this time.
>
>
>
> Steve
>
> *This was the first time I gave up on trying to learn something by
> reading the documentation. The page in my TRS-80 manual said something
> like print(string) and absolutely NOWHERE in the book was there the
> least little hint that the correct syntax was print("string"). I
> struggled with it for days, believing those people, before finally
> giving up. To my credit, I did not blame myself.
>
>
>
>
>
> _______________________________________________
> colug-432 mailing list
> colug-432 at colug.net
> http://lists.colug.net/mailman/listinfo/colug-432

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.colug.net/pipermail/colug-432/attachments/20150517/5e54aa6e/attachment.html 


More information about the colug-432 mailing list