我必须使用python解析几千个txt文档,但是现在我得到的代码只适用于一个。
我试图找到文件中出现的任何月份(1月,2月,3月等)的第一次,并返回第一个月的位置。每份文件至少有一个月,但有些文件有几个月。
目前有效,但看起来非常麻烦:
mytext = open('2.txt','r')
mytext = mytext.read()
January = mytext.find("January")
February = mytext.find("February")
March = mytext.find("March")
April = mytext.find("April")
May = mytext.find("May")
June = mytext.find("June")
July = mytext.find("July")
August = mytext.find("August")
September = mytext.find("September")
October = mytext.find("October")
November = mytext.find("November")
December = mytext.find("December")
monthpos = [January, February, March, April, May, June, July, August, September, October, November, December]
monthpos = [x for x in monthpos if x != -1]
print min(monthpos)
# returns the first match as a number
我想结合使用any()和find()之类的东西来完成工作,但似乎没有更好的方法来做到这一点。我发现this question但不是那么清楚,所以它没有那么大帮助。虽然我知道这是错误的并且由于许多原因而无法工作,但这就是我想要做的事情:
mytext = open('text.txt','r')
mytext = mytext.read()
months = ["January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"]
print mytext.find(months) #where this would find the first time any month is matched
1945 # return the location in the string where the first month is found
提前致谢。
答案 0 :(得分:2)
我认为这会做你想要的:
months = ["January", "February", "March", "April",
"May", "June", "July", "August",
"September", "October", "November", "December"]
indices = [s.find(month) for month in months]
first = min(index for index in indices if index > -1)
首先,我们得到每个月的第一次出现(如果不存在,则为-1
),然后我们得到最小的索引,除非它是-1
。如果找不到任何内容,则会抛出ValueError
,这可能是您想要的,也可能不是。
正如Two-Bit Alchemist评论的那样,你可以提高效率:
months = ["January", "February", "March", "April",
"May", "June", "July", "August",
"September", "October", "November", "December"]
first = None
for month in sorted(months, key=len):
i = s[:first].find(month) # only search first part of string
if i != -1:
if i < first or first is None:
first = i
if i < len(month): # not enough room for any remaining months
break
答案 1 :(得分:0)
我会使用re
来简化概念。如果您以后需要,也可以轻松扩展代码以执行更复杂的操作。
import re
mytext = open('text.txt','r')
mytext = mytext.read()
months = ["January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"]
months_match = re.search("|".join(months), mytext)
print match_obj.start()