Python搜索字符串以查找列表中任何项目的第一次出现

时间:2014-03-21 17:09:05

标签: python string

我必须使用python解析几千个txt文档,但是现在我得到的代码只适用于一个。

我试图找到文件中出现的任何月份(1月,2月,3月等)的第一次,并返回第一个月的位置。每份文件至少有一个月,但有些文件有几个月。

目前有效,但看起来非常麻烦:

mytext = open('2.txt','r')
mytext = mytext.read()

January = mytext.find("January")
February = mytext.find("February")
March = mytext.find("March")
April = mytext.find("April")
May = mytext.find("May")
June = mytext.find("June")
July = mytext.find("July")
August = mytext.find("August")
September = mytext.find("September")
October = mytext.find("October")
November = mytext.find("November")
December = mytext.find("December")

monthpos = [January, February, March, April, May, June, July, August, September, October, November, December]
monthpos = [x for x in monthpos if x != -1]
print min(monthpos)
 # returns the first match as a number

我想结合使用any()和find()之类的东西来完成工作,但似乎没有更好的方法来做到这一点。我发现this question但不是那么清楚,所以它没有那么大帮助。虽然我知道这是错误的并且由于许多原因而无法工作,但这就是我想要做的事情:

mytext = open('text.txt','r')
mytext = mytext.read()
months = ["January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"]
print mytext.find(months) #where this would find the first time any month is matched
1945 # return the location in the string where the first month is found

提前致谢。

2 个答案:

答案 0 :(得分:2)

我认为这会做你想要的:

months = ["January", "February", "March", "April", 
          "May", "June", "July", "August", 
          "September", "October", "November", "December"]
indices = [s.find(month) for month in months]
first = min(index for index in indices if index > -1)

首先,我们得到每个月的第一次出现(如果不存在,则为-1),然后我们得到最小的索引,除非它是-1。如果找不到任何内容,则会抛出ValueError,这可能是您想要的,也可能不是。


正如Two-Bit Alchemist评论的那样,你可以提高效率:

months = ["January", "February", "March", "April", 
          "May", "June", "July", "August", 
          "September", "October", "November", "December"]
first = None
for month in sorted(months, key=len):
    i = s[:first].find(month) # only search first part of string
    if i != -1:
        if i < first or first is None:
            first = i
        if i < len(month): # not enough room for any remaining months
            break

答案 1 :(得分:0)

我会使用re来简化概念。如果您以后需要,也可以轻松扩展代码以执行更复杂的操作。

import re
mytext = open('text.txt','r')
mytext = mytext.read()
months = ["January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"]
months_match = re.search("|".join(months), mytext)
print match_obj.start()

http://docs.python.org/2/library/re.html