[colug-432] Podcast script

Mon Mar 29 12:04:45 EDT 2010

On Monday 29 March 2010 11:28:07 am Bill Baker wrote:
> I'm trying to write a script that downloads a web page that has the URL
> for a podcast on it, searches for the URL (which ends with .m4v) and
> downloads it to my Desktop.  Here's what I have so far:
> 
> ---------------
> #!/bin/bash
> wget --no-cache http://podcasturl/podcast.xml
> url=`grep .m4v podcast.xml`
> wget $url -O ~/Desktop
> rm podcast.xml
> ---------------
> 
> I know that in the third line grep doesn't work because it returns the
> whole line, not just the URL.  Is there anything else I can use to get
> it to return just the URL?

Well, a proper solution would involve actually parsing the XML (really using 
an XML-parsing library in a Real language), or at least pulling out the XML 
tag you're looking for (generally <enclosure>, if I recall correctly) before 
looking for the URL. But the quick-and-dirty way can work....

Check out grep's -o option: "show only the part of a line matching PATTERN"
Then change your pattern to something like 'http://[^ ]*\.m4v'.

You don't appear to be accounting for multiple m4v links in your XML. Maybe 
add a "head -1".
And what if there's no m4v link found in the XML?

Also, you can get rid of all the temporary-file business by using '-O-' as a 
wget option, and pipe wget into grep. (Or use curl, which defaults to stdout 
output.)
Speaking of -O, I believe you're misusing it in the second wget; it should 
come before the URL, and specifies a target file, not a target directory. You 
want -P instead.

#!/bin/bash
feed=http://podcasturl/podcast.xml
url=$( wget -O- $feed | grep -o 'http://[^ ]*\.m4v' | head -1 )
test "$url" && wget -P ~/Desktop/ $url

-- 
==============================| "A slice of life isn't the whole cake
 Rob Funk <rfunk at funknet.net> | One tooth will never make a full grin"
 http://www.funknet.net/rfunk |    -- Chris Mars, "Stuck in Rewind"