[colug-432] Code check
Tom Hanlon
tom at functionalmedia.com
Thu Aug 1 13:37:55 EDT 2013
Colug,
Writing some python to do the following
Take this list, or any similar list of variable length
['DVD', 'dishwasher', 'software', 'tv']
And return all two item combinations.
DVD *** dishwasher
DVD *** software
DVD *** tv
dishwasher *** software
dishwasher *** tv
software *** tv
My first draft looks like this.
#!/usr/bin/python
list1 = ['software', 'DVD', 'dishwasher', 'tv'];
list1.sort();
print list1;
for i in range(len(list1)):
for i2 in range(i+1,len(list1)):
print list1[i], "***",list1[i2];
It works.
Can this be done in a cleaner way ?
Am I invoking any bad habits ?
Is there a builtin tool for this ?
Reasons I am doing this..
hive the hadoop SQL to Map Reduce translator has ngrams, that allow
splitting strings into configurable length substrings. That is sorta cool,
but I want all possible pairs, not just consecutive pairs.
The list of items is an array to hive, I could pass an array of arrays if
there was a complete "shuffle" I could do on the array (word list) . Not
finding this I figure that shipping the itemlist to python would be the
most efficient.
So that is the use case, but help with the python in terms of a quick
sanity check is what I need. Unless you happen to be doing word
co-occurrence in hive and have some insight there.
Thanks, Colug,
Tom
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.colug.net/pipermail/colug-432/attachments/20130801/55bf76a1/attachment.html
More information about the colug-432
mailing list