我正在设置一个脚本,根据文件名中包含的文本合并PDF。我在这里的问题是“Violin I”也包含在“Violin II”中,而“Alto Saxophone I”也包含在“Alto Saxophone II”中。如何设置这个,所以tempList只包含“Violin I”中的条目并排除“Violin II”,反之亦然?
pdfList = ["01 Violin I.pdf", "02 Violin I.pdf","01 Violin II.pdf", "02 Violin II.pdf", ]
instruments = ["Soprano", "Tenor", "Violin I", "Violin II", "Viola", "Cello", "Contrabass", "Alto Saxophone I", "Alto Saxophone II", "Tenor Saxophone", "Baritone Saxophone"]
# create arrays for each instrument that can be used for merging/organization
def organizer():
for fileName in pdfList:
for instrument in instruments:
tempList = []
if instrument in fileName:
tempList.append(fileName)
print tempList
print pdfList
organizer()
答案 0 :(得分:3)
避免包含子串的一种方法是使用正则表达式:
import re
pdfList = ["01 Violin I.pdf", "02 Violin I.pdf","01 Violin II.pdf", "02 Violin \
II.pdf", ]
instruments = ["Soprano", "Tenor", "Violin I", "Violin II", "Viola", "Cello", "\
Contrabass", "Alto Saxophone I", "Alto Saxophone II", "Tenor Saxophone", "Barit\
one Saxophone"]
# create arrays for each instrument that can be used for merging/organization
def organizer():
for fileName in pdfList:
tempList = []
for instrument in instruments:
if re.search(r'\b{}\b'.format(instrument), fileName):
tempList.append(fileName)
print tempList
print pdfList
organizer()
这会将您的搜索字词包装为\b
,以便只有在开头和结尾位于字边界时才匹配。此外,也许显而易见但值得指出,这也将使你的乐器名称成为正则表达式的一部分,所以请注意,如果你使用任何也是正则表达式元字符的字符,他们将会是这样的(正是你现在的'不是)。一个更通用的方案需要一些代码才能找到并正确地转义这些字符。
答案 1 :(得分:1)
尝试进行此更改:
...
if instrument+'.pdf' in fileName:
...
这会涵盖所有情况吗?