<div dir="ltr"><div><div><div><div><div><div><div><div><div><div><div>Colug, <br><br><br></div>Writing some python to do the following<br><br>Take this list, or any similar list of variable length<br><br>['DVD', 'dishwasher', 'software', 'tv']<br>
<br>And return all two item combinations. <br><br>DVD *** dishwasher<br>DVD *** software<br>DVD *** tv<br>dishwasher *** software<br>dishwasher *** tv<br>software *** tv<br><br></div>My first draft looks like this. <br><br>
#!/usr/bin/python<br>list1 = ['software', 'DVD', 'dishwasher', 'tv'];<br>list1.sort();<br>print list1;<br>for i in range(len(list1)):<br> for i2 in range(i+1,len(list1)):<br> print list1[i], "***",list1[i2];<br>
<br><br></div>It works. <br><br></div>Can this be done in a cleaner way ? <br></div>Am I invoking any bad habits ? <br></div>Is there a builtin tool for this ? <br><br></div>Reasons I am doing this..<br></div>hive the hadoop SQL to Map Reduce translator has ngrams, that allow splitting strings into configurable length substrings. That is sorta cool, but I want all possible pairs, not just consecutive pairs. <br>
<br></div>The list of items is an array to hive, I could pass an array of arrays if there was a complete "shuffle" I could do on the array (word list) . Not finding this I figure that shipping the itemlist to python would be the most efficient. <br>
<br></div>So that is the use case, but help with the python in terms of a quick sanity check is what I need. Unless you happen to be doing word co-occurrence in hive and have some insight there. <br><br></div>Thanks, Colug, <br>
<br>Tom <br><div><div><div><div><div><div><div><div><div><br></div></div></div></div></div></div></div></div></div></div>