[colug-432] Podcast script

Mon Mar 29 12:25:22 EDT 2010

Thanks, that helped me.  I ended up having to tweak it a little bit
since the filename it downloads is too long for some reason but now it
seems to be working perfectly.

Thanks again!

On Mon, 2010-03-29 at 12:04 -0400, Rob Funk wrote:
> On Monday 29 March 2010 11:28:07 am Bill Baker wrote:
> > I'm trying to write a script that downloads a web page that has the URL
> > for a podcast on it, searches for the URL (which ends with .m4v) and
> > downloads it to my Desktop.  Here's what I have so far:
> > 
> > ---------------
> > #!/bin/bash
> > wget --no-cache http://podcasturl/podcast.xml
> > url=`grep .m4v podcast.xml`
> > wget $url -O ~/Desktop
> > rm podcast.xml
> > ---------------
> > 
> > I know that in the third line grep doesn't work because it returns the
> > whole line, not just the URL.  Is there anything else I can use to get
> > it to return just the URL?
> 
> Well, a proper solution would involve actually parsing the XML (really using 
> an XML-parsing library in a Real language), or at least pulling out the XML 
> tag you're looking for (generally <enclosure>, if I recall correctly) before 
> looking for the URL. But the quick-and-dirty way can work....
> 
> Check out grep's -o option: "show only the part of a line matching PATTERN"
> Then change your pattern to something like 'http://[^ ]*\.m4v'.
> 
> You don't appear to be accounting for multiple m4v links in your XML. Maybe 
> add a "head -1".
> And what if there's no m4v link found in the XML?
> 
> Also, you can get rid of all the temporary-file business by using '-O-' as a 
> wget option, and pipe wget into grep. (Or use curl, which defaults to stdout 
> output.)
> Speaking of -O, I believe you're misusing it in the second wget; it should 
> come before the URL, and specifies a target file, not a target directory. You 
> want -P instead.
> 
> #!/bin/bash
> feed=http://podcasturl/podcast.xml
> url=$( wget -O- $feed | grep -o 'http://[^ ]*\.m4v' | head -1 )
> test "$url" && wget -P ~/Desktop/ $url
>