[colug-432] sed, regex dialects, crummy documentation, good writing

Sun May 17 19:08:44 EDT 2015

On Sun, 17 May 2015 10:11:29 -0400, Steve VanSlyck <s.vanslyck at postpro.net> wrote:

> ... asked me some good questions about my one-liner,
> 
> sed -i.bak 's:(^Defaults\s*Env_reset\s*)$:$1\nDefaults
> editor=/bin/nano:' ./sudoers
> 
> , one of which was why "$1?"
> 
> For the "replace" portion of sed, I was trying to find out how to return
> the result of the last search and $1 seemed to be the right way to do
> it. I saw what I thought was a authoritative page on this but cannot
> find it now. One that I did find is
> http://stackoverflow.com/questions/2890700/backreferences-syntax-in-replacement-strings-why-dollar-sign

That article talks about Java and Perl.
Your example used sed, which predates Perl by much.
I saw no direct mention of sed in that article.

What has your experimentation revealed about how sed works?

> Everything I'm readying about regular expressions talks about this
> engine does it this way and that engine does it that way and it's all
> very confusing and frustrating.

Indeed. It is that way. I don't pay any attention to the 
underlying engines, just how each program implements regexes. 
Knowing what the common themes are,
I often figure things out with experimentation. 

> None of the "engines" are fedora-based bash,
> so I have no idea which engine's requirements I'm
> supposed to go by.

You have to read the manual for each program _and_ experiment 
to back it up. I think of shells as implementing globs, not 
regexes, although they might have some regex capability that 
I'm not aware of. Learn the difference between globs and regexes.

> On top of this I have to struggle with a problem endemic to technical
> writing and which is far too common: writers who state a rule and then
> give examples inconsistent with the rule just described!

Yup. It sucks. Experiment. Sometimes even read the source code.

> This makes me scream.

Yup. It sucks. 

> ^A matches "A" at the beginning of a line A$ matches "A" at the end of
> a line A^ matches "A^" anywhere on a line $A matches "$A" anywhere on a
> line ^^ matches "^" at the beginning of a line $$ matches "$" at the
> end of a line

Those make sense to me.

> On grymoire.com, for
> example, Bruce Barnett writes, "If you need to match a "^" at the
> beginning of the line, or a "$" at the end of a line, you must escape
> the special character with a backslash." 

I think he is wrong and very close to being right. 
Better would have been to say pattern instead of line. I.e., 

    "If you need to match a "^" at the beginning of the 
    pattern, or a "$" at the end of a pattern,
    you must escape the special character with a backslash." 

Submit a patch to him.

Books benefit from the work of a good editor. 
Much online stuff does not have the benefit of editors, 
and it shows. Often it helps to consult several online 
sources, and compare them.

By the way, there is often more than one way to do 
something. An exercise can be to see how many ways 
one can do something. 

> WT#! It maybe would've killed him to mention escaping the character OR
> doubling it, and then give BOTH examples?:

You are his editor. Submit a patch. 

> How are we supposed to learn anything if we can't trust our teachers!

By experience, which involves making mistakes and getting burned
by misplaced trust. You'll develop a sense of how much to trust 
information. The judgement for such is highly subjective.
You'll notice a correlation between the various kinds of
goodness in writing with the accuracy of the information.
Having suffered from poor writing, you will appreciate good 
writing more and more. You will pay more attention to a well 
written question than lesser writing. You are not alone. 
Others behave similarly. They are waiting for a good question.
This leads back to the advice at the top of colug.net.

> I'm almost 60. I approach this stuff in good faith, with no 
> math training beyond high school, no programming experience 
> beyond Excel, ...

Excel experience should have been enough to make you wary. :-)

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 

Many O'Reilly books are good. Older editions are cheap.
For classic Unix tools, old editions are still relevant. 
Consider reading the following:

    sed & awk by Dale Dougherty & Arnold Robbins

It covers regular expressions as used by those programs.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 

I have found many documentation errors over the years.
Sometimes I later realized that the documentation was
correct and that I was wrong. An Intel 8051 manual was 
one manual I thought was wrong, but later realized was 
correct. The more I studied it, the more I realized how 
carefully it was written, that every word was chosen well.

Some authors are great. Kernighan and Ritchie wrote 
"The C Programming Language" with wit. W. Richard Stevens 
wrote APUE with amazing precision. A Fortran manual from 
IBM by anonymouse authors was excellent.