我有两个列表,可能有不同的长度。每个列表都包含字符串形式的文件名。我无法控制名称,但我确信名称结构不会改变。它总是类似于name1_name2_number1 _ +(或 - )number2.jpg
Number1是我想要在两个列表之间匹配的子字符串。如果一个列表中的文件名包含与另一个列表中的文件名相同的number1,我想将这两个文件名附加到第三个列表中。我有一个简单的函数,它将获得给定列表中的number1,例如:
>>>list1 = ['serentity01_20malcolm_200_+3.jpg','inara03_kaley40_8000_-1.jpg']
>>>def GetNum(imgStrings):
... ss = []
... for b in imgStrings:
... ss.append([w for w in b.split('_') if w.isdigit()])
... #flatten zee list of lists because it is ugly.
... return [val for subl in ss for val in subl]
>>>GetNum(list1)
['200', '800]
所以,对于
>>>list1 = ['serentity01_20malcolm_200_+3.jpg','inara03_kaley40_8000_-1.jpg']
>>>list2 = ['inara03_summer40_8000_-2.jpg', 'book23_42jayne_400_+2.jpg', 'summer53_21simon_300_-1.jpg']
>>>awesomesauceSubstringMatcher(list1, list2)
['inara03_kaley40_8000_-1.jpg', 'inara03_summer40_8000_-2.jpg']
我觉得我应该可以用我的GetNum函数和一些列表理解来做到这一点,但是整个'[等等......)的句法中的狡猾对我来说是新的,我不能我非常喜欢这个。思考?建议?死亡威胁?感谢所有有用的回复,如果我的googlefu在试图找到类似的问题/答案时让我失败,那么就会有一千道歉。
修改 我只想出这个解决方案:
[str for str in list1+list2 if any(subs in str for subs in GetNum(list1)) and any(subs in str for subs in GetNum(list2))]
我知道这很长很丑,但我真的想向自己证明它可以用列表理解来完成。感谢所有有用的回复!
答案 0 :(得分:1)
list1 = ['serentity01_20malcolm_200_+3.jpg','inara03_kaley40_8000_-1.jpg']
list2 = ['inara03_summer40_8000_-2.jpg', 'book23_42jayne_400_+2.jpg', 'summer53_21simon_300_-1.jpg']
def getNum(image_name_list):
for s in image_name_list:
s = s.split('_')[2]
if s.isdigit():
yield s
else:
yield None
def getMatchingIndex(list1, list2):
other_list = list(getNum(list2))
for (i, num) in enumerate(getNum(list1)):
if not num:
continue
for (j, other_num) in enumerate(getNum(list2)):
if (num == other_num):
yield (i, j)
for i1, i2 in getMatchingIndex(list1, list2):
print list1[i1], list2[i2]
由于我们只需要一次比较一个项目到第二个列表中的每一次,我在getNum中使用了一个生成器来节省内存。由于数字可能不止一次匹配,我会不断检查每个项目。
答案 1 :(得分:0)
未经测试,但逻辑应该是正确的:
list1 = ['serentity01_20malcolm_200_+3.jpg','inara03_kaley40_8000_-1.jpg']
list2 = ['inara03_summer40_8000_-2.jpg', 'book23_42jayne_400_+2.jpg', 'summer53_21simon_300_-1.jpg']
list3 = []
seenInList1Dict = {}
for element in list1:
splitelem = element.split('_')
seenInList1Dict[splitelem[2]] = 1
for element in list2:
splitelem = element.split('_')
if splitelem[2] in seenInList1Dict:
list3.append(element)
我没有使用你的GetNum
因为它不必要地使IMO变得复杂。如果你想稍后快速查找/比较它们的存在,我发现将事物转储到字典中会更容易。此外,如果您需要该号码,您只需要对文件名执行split
并从相应的索引中获取所需的值。
答案 2 :(得分:0)
我会为两个列表构建一个字典,其中键是文件名中的数字,值是文件名本身。然后“交叉”两组密钥,然后可以使用生成的公共密钥来构建第三个列表,例如:
def List2Dic(List):
return dict(map(lambda x: [ x.split("_")[2], x], List))
list1 = ['serentity01_20malcolm_200_+3.jpg','inara03_kaley40_8000_-1.jpg']
list2 = ['inara03_summer40_8000_-2.jpg', 'book23_42jayne_400_+2.jpg', 'summer53_21simon_300_-1.jpg']
d1 = List2Dic(list1)
d2 = List2Dic(list2)
for x in set(d1) & set(d2):
print d1[x], d2[x]
答案 3 :(得分:0)
将字符串解析为您可以实际筛选的数据。事情会好得多。
def process(filename):
splitup = filename.rstrip('.jpg').split('_')
keys = ["name1", "name2", "number1", "number2"]
r = dict(zip(keys, splitup))
r["filename"] = filename
return r
list1 = ['serentity01_20malcolm_200_+3.jpg','inara03_kaley40_8000_-1.jpg']
list2 = ['inara03_summer40_8000_-2.jpg', 'book23_42jayne_400_+2.jpg', 'summer53_21simon_300_-1.jpg']
plist1 = [process(f) for f in list1]
plist2 = [process(f) for f in list2]
nlist1 = [i['number1'] for i in plist1]
nlist2 = [i['number1'] for i in plist2]
ilist1 = [i for i in plist1 if i['number1'] in nlist2]
ilist2 = [i for i in plist2 if i['number1'] in nlist1]
intersection = set([i["filename"] for i in ilist1 + ilist2])
for i in intersection:
print i
编辑:拍摄,我现在看到你想要两个列表中的交叉点。
答案 4 :(得分:0)
My bit of the solution using map,reduce, filter and list flattening using sum:-
l=['a_b_1_2','b_c_2_3']
s=['c_d_3_4','d_e_1_4']
a=map(lambda y: map(lambda z: [y,z] if y[2] == z[2] else '', map(lambda v:v.split('_'), s)),map(lambda x:x.split('_'),l))
map(lambda x: '_'.join(x), sum(filter(lambda qq: qq is not '',sum(a,[]))))
在实际数据集上显示:
>>> list1 = ['serentity01_20malcolm_200_+3.jpg','inara03_kaley40_8000_-1.jpg']
>>> list2 = ['inara03_summer40_8000_-2.jpg', 'book23_42jayne_400_+2.jpg', 'summer53_21simon_300_-1.jpg']
>>> a=map(lambda y: map(lambda z: [y,z] if y[2] == z[2] else '', map(lambda v:v.split('_'), list2)),map(lambda x:x.split('_'),list1))
>>> a
[['', '', ''], [[['inara03', 'kaley40', '8000', '-1.jpg'], ['inara03', 'summer40', '8000', '-2.jpg']], '', '']]
>>> sum(filter(lambda qq: qq is not '',sum(a,[])),[])
[['inara03', 'kaley40', '8000', '-1.jpg'], ['inara03', 'summer40', '8000', '-2.jpg']]
>>> map(lambda x: '_'.join(x), sum(filter(lambda qq: qq is not '',sum(a,[])),[]))
['inara03_kaley40_8000_-1.jpg', 'inara03_summer40_8000_-2.jpg'] #This is the output you want.
答案 5 :(得分:0)
这将返回两个列表中所有匹配值的列表。例如,如果存在数字8000和300的匹配项,它将为每个可能的数字返回一个列表的列表,然后仅使用匹配项填充列表。
list1 = ['serentity01_20malcolm_200_+3.jpg','inara03_kaley40_8000_-1.jpg',
'inara03_34simon_300_+1.jpg']
list2 = ['inara03_summer40_8000_-2.jpg', 'book23_42jayne_400_+2.jpg',
'summer53_21simon_300_-1.jpg']
def GetNum(imgStrings):
ss = []
for b in imgStrings:
ss.append([w for w in b.split('_') if w.isdigit()])
#flatten zee list of lists because it is ugly.
return [val for subl in ss for val in subl]
print GetNum(list1)
def addToThird(input1, input2):
numlist1 = GetNum(input1)
numlist2 = GetNum(input2)
numgroups = set(numlist1 + numlist2)
numgroups = list(numgroups)
collectionsList = []
for i in numgroups:
collectionsList.append([])
for item1 in numlist1:
for item2 in numlist2:
if item1 == item2:
print item1, item2
goindex = numgroups.index(item1)
collectionsList[goindex].append(input1[numlist1.index(item1)])
collectionsList[goindex].append(input1[numlist2.index(item2)])
return collectionsList
print addToThird(list1, list2)
输出:
['200', '8000', '300']
8000 8000
300 300
[['inara03_34simon_300_+1.jpg', 'inara03_34simon_300_+1.jpg'], [],
'inara03_kaley40_8000_-1.jpg', 'serentity01_20malcolm_200_+3.jpg'], []]