[colug-432] Podcast script
bill_chris at earthlink.net
Mon Mar 29 12:25:22 EDT 2010
Thanks, that helped me. I ended up having to tweak it a little bit
since the filename it downloads is too long for some reason but now it
seems to be working perfectly.
On Mon, 2010-03-29 at 12:04 -0400, Rob Funk wrote:
> On Monday 29 March 2010 11:28:07 am Bill Baker wrote:
> > I'm trying to write a script that downloads a web page that has the URL
> > for a podcast on it, searches for the URL (which ends with .m4v) and
> > downloads it to my Desktop. Here's what I have so far:
> > ---------------
> > #!/bin/bash
> > wget --no-cache http://podcasturl/podcast.xml
> > url=`grep .m4v podcast.xml`
> > wget $url -O ~/Desktop
> > rm podcast.xml
> > ---------------
> > I know that in the third line grep doesn't work because it returns the
> > whole line, not just the URL. Is there anything else I can use to get
> > it to return just the URL?
> Well, a proper solution would involve actually parsing the XML (really using
> an XML-parsing library in a Real language), or at least pulling out the XML
> tag you're looking for (generally <enclosure>, if I recall correctly) before
> looking for the URL. But the quick-and-dirty way can work....
> Check out grep's -o option: "show only the part of a line matching PATTERN"
> Then change your pattern to something like 'http://[^ ]*\.m4v'.
> You don't appear to be accounting for multiple m4v links in your XML. Maybe
> add a "head -1".
> And what if there's no m4v link found in the XML?
> Also, you can get rid of all the temporary-file business by using '-O-' as a
> wget option, and pipe wget into grep. (Or use curl, which defaults to stdout
> Speaking of -O, I believe you're misusing it in the second wget; it should
> come before the URL, and specifies a target file, not a target directory. You
> want -P instead.
> url=$( wget -O- $feed | grep -o 'http://[^ ]*\.m4v' | head -1 )
> test "$url" && wget -P ~/Desktop/ $url
More information about the colug-432