Question

我必须使用python解析几千个txt文档，但是现在我得到的代码只适用于一个。

我试图找到文件中出现的任何月份（1月，2月，3月等）的第一次，并返回第一个月的位置。每份文件至少有一个月，但有些文件有几个月。

目前有效，但看起来非常麻烦：

mytext = open('2.txt','r')
mytext = mytext.read()

January = mytext.find("January")
February = mytext.find("February")
March = mytext.find("March")
April = mytext.find("April")
May = mytext.find("May")
June = mytext.find("June")
July = mytext.find("July")
August = mytext.find("August")
September = mytext.find("September")
October = mytext.find("October")
November = mytext.find("November")
December = mytext.find("December")

monthpos = [January, February, March, April, May, June, July, August, September, October, November, December]
monthpos = [x for x in monthpos if x != -1]
print min(monthpos)
 # returns the first match as a number

我想结合使用any（）和find（）之类的东西来完成工作，但似乎没有更好的方法来做到这一点。我发现this question但不是那么清楚，所以它没有那么大帮助。虽然我知道这是错误的并且由于许多原因而无法工作，但这就是我想要做的事情：

mytext = open('text.txt','r')
mytext = mytext.read()
months = ["January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"]
print mytext.find(months) #where this would find the first time any month is matched
1945 # return the location in the string where the first month is found

提前致谢。

Answer 1

我认为这会做你想要的：

months = ["January", "February", "March", "April", 
          "May", "June", "July", "August", 
          "September", "October", "November", "December"]
indices = [s.find(month) for month in months]
first = min(index for index in indices if index > -1)

首先，我们得到每个月的第一次出现（如果不存在，则为-1），然后我们得到最小的索引，除非它是-1。如果找不到任何内容，则会抛出ValueError，这可能是您想要的，也可能不是。

正如Two-Bit Alchemist评论的那样，你可以提高效率：

months = ["January", "February", "March", "April", 
          "May", "June", "July", "August", 
          "September", "October", "November", "December"]
first = None
for month in sorted(months, key=len):
    i = s[:first].find(month) # only search first part of string
    if i != -1:
        if i < first or first is None:
            first = i
        if i < len(month): # not enough room for any remaining months
            break

Answer 2

我会使用re来简化概念。如果您以后需要，也可以轻松扩展代码以执行更复杂的操作。

import re
mytext = open('text.txt','r')
mytext = mytext.read()
months = ["January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"]
months_match = re.search("|".join(months), mytext)
print match_obj.start()

http://docs.python.org/2/library/re.html

Python搜索字符串以查找列表中任何项目的第一次出现

2 个答案: