[colug-432] Podcast script
Rob Funk
rfunk at funknet.net
Mon Mar 29 12:04:45 EDT 2010
On Monday 29 March 2010 11:28:07 am Bill Baker wrote:
> I'm trying to write a script that downloads a web page that has the URL
> for a podcast on it, searches for the URL (which ends with .m4v) and
> downloads it to my Desktop. Here's what I have so far:
>
> ---------------
> #!/bin/bash
> wget --no-cache http://podcasturl/podcast.xml
> url=`grep .m4v podcast.xml`
> wget $url -O ~/Desktop
> rm podcast.xml
> ---------------
>
> I know that in the third line grep doesn't work because it returns the
> whole line, not just the URL. Is there anything else I can use to get
> it to return just the URL?
Well, a proper solution would involve actually parsing the XML (really using
an XML-parsing library in a Real language), or at least pulling out the XML
tag you're looking for (generally <enclosure>, if I recall correctly) before
looking for the URL. But the quick-and-dirty way can work....
Check out grep's -o option: "show only the part of a line matching PATTERN"
Then change your pattern to something like 'http://[^ ]*\.m4v'.
You don't appear to be accounting for multiple m4v links in your XML. Maybe
add a "head -1".
And what if there's no m4v link found in the XML?
Also, you can get rid of all the temporary-file business by using '-O-' as a
wget option, and pipe wget into grep. (Or use curl, which defaults to stdout
output.)
Speaking of -O, I believe you're misusing it in the second wget; it should
come before the URL, and specifies a target file, not a target directory. You
want -P instead.
#!/bin/bash
feed=http://podcasturl/podcast.xml
url=$( wget -O- $feed | grep -o 'http://[^ ]*\.m4v' | head -1 )
test "$url" && wget -P ~/Desktop/ $url
--
==============================| "A slice of life isn't the whole cake
Rob Funk <rfunk at funknet.net> | One tooth will never make a full grin"
http://www.funknet.net/rfunk | -- Chris Mars, "Stuck in Rewind"
More information about the colug-432
mailing list