如何在Python

时间:2016-02-15 17:38:46

标签: python string list python-2.7 search

在Python 2.7中,我想查找并计算文件名中包含特定字符串列表的文件。

文件列表:

  • Passport_Mike.pdf
  • 大卫-Passport.pd
  • Iain ID Card.pdf
  • CopyPassport Michael.pdf
  • 驾驶执照John.pdf

我想计算所有包含'Passport'或'ID'的文件。

目前我找到了一种方法,我根据分隔符(_- /')将文件名拆分为不同的单词。我的文件不能总是被找到,因为文件不能总是被分隔,例如'CopyPassport Michael',因为它没有相应的分隔符,可以将'Passport'与'Copy'分开。

我的代码基于另一个问题中给出的this答案。对于此代码,我使用collections.Counter()

这是我的代码:

from collections import Counter

listOfFiles = [Passport_Mike.pdf, David-Passport.pdf, Iain ID Card.pdf, CopyPassport Michael.pdf, Driving License John.pdf]
searrchTermsList = ["Passport", ÏD']

def fileSplit(string, delimiters):
    delimiters = tuple(delimiters)
    stack = [string,]

    for delimiter in delimiters:
        for i, substring in enumerate(stack):
            substack = substring.split(delimiter)
            stack.pop(i)
            for j, _substring in enumerate(substack):
                stack.insert(i+j, _substring)
    return stack
#This is a complicated split function but this method makes the files split into parts in my next function. Other split methods didn't work for me.

def searchTermsCount(listOfFiles, searchTermsList):
            counts = Counter()              
            for myFile in listOfFiles:
                myFileSplit = fileSplit(myFile,('_',' ','-','.'))
                counts.update(word.upper() for word in myFileSplit)
            myCount = 0
            for word in searchTermsList:
                myCount +=counts[word]
            print "Count files:", myCount

什么是Python 2.7方法来计算在不使用分隔符的情况下在文件名中包含字符串列表的文件?

2 个答案:

答案 0 :(得分:1)

试试这个:

listOfFiles = ['Passport_Mike.pdf', 'David-Passport.pdf', 'Iain ID Card.pdf', 'CopyPassport Michael.pdf', 'Driving License John.pdf']
searrchTermsList = ["Passport", 'ID']
relevantfiles = [filename for filename in listOfFiles if any(searchterm in filename for searchterm in searrchTermsList)]
print(relevantfiles)

输出:

['Passport_Mike.pdf', 'David-Passport.pdf', 'Iain ID Card.pdf', 'CopyPassport Michael.pdf']

答案 1 :(得分:0)

listOfFiles = ['Passport_Mike.pdf', 'David-Passport.pdf', 'Iain ID Card.pdf', 'CopyPassport Michael.pdf', 'Driving License John.pdf']
searrchTermsList = ['Passport', 'ÏD']

filenamesWithTerms = [] 
for filename in listOfFiles:
   for term in searrchTermsList:
        if term in filename:
            filenamesWithTerms.append(filename) 
            break
print filenamesWithTerms
>>['Passport_Mike.pdf', 'David-Passport.pdf', 'Iain ID Card.pdf', 'CopyPassport Michael.pdf']