当我尝试使用3-4个文件的代码作为测试时,它工作正常。但是,当我运行了3,000多个文件时,错误消息弹出,并在extract_data中的第51行显示文件“ C:\ Users \ dul \ Dropbox \ Article \ ap_final.py” combid = matchcomp2 +“,” + strdate +“,” + matchw +“,” + matchcount UnboundLocalError:分配前已引用本地变量“ strdate”
我搜索了,看来是全局的问题。我根本不明白那是什么意思。请帮助。
import os,csv,datefinder,re
import numpy as np
os.chdir('C:\Users\dul\Dropbox\Article\parsedarticles')
def matchwho(text_to_match):
if 'This story was generated by' in text_to_match:
return('1')
elif 'This story includes elements generated' in text_to_match:
return('2')
elif 'Elements of this story were generated' in text_to_match:
return('2')
elif 'Portions of this story were generated' in text_to_match:
return('2')
elif 'Parts of this story were generated' in text_to_match:
return('2')
elif 'A portion of this story was generated' in text_to_match:
return('2')
elif 'This sory was partially generated by' in text_to_match:
return('2')
elif 'This story contains elements generated by' in text_to_match:
return('2')
elif 'This story includes information generated by' in text_to_match:
return('2')
elif 'This story was originally generated by' in text_to_match:
return('1')
else:
return('3')
def extract_data(filename):
with open(filename, 'r') as file1:
text1=file1.read()
#locate the date of the article
matches = list(datefinder.find_dates(text1))
if len(matches) > 0:
date=matches[1]
strdate = str(date)
else:
print 'No dates found'
#locate the name of the company2
matchcomp2 = text1.split(' ', 1)[0]
#count the number of words in the article
matchcount = re.search(r'(.*) words', text1).group(1).strip()
#determine the article
matchw =str(matchwho(text1))
#list the returns in a line
combid = matchcomp2 + "," + strdate + "," + matchw + "," + matchcount
#save in txt format
with open('outfile.txt', "a+") as outfile:
outfile.write("\n"+combid)
files = os.listdir("C:\Users\dul\Dropbox\Article\parsedarticles")
for file in files:
if ".txt" in file:
extract_data(file)
答案 0 :(得分:0)
strdate 仅在len(matches)> 0时定义,但在分配给 combid 的任何情况下都使用。
答案 1 :(得分:0)
当len(matches) < 0
def extract_data(filename):
...
if len(matches) > 0:
date=matches[1]
strdate = str(date)
else:
print 'No dates found'
因此,如果您的条件语句失败,则strdate
永远不会被设置。但是,
combid = matchcomp2 + "," + strdate + "," + matchw + "," + matchcount
取决于所设置的内容,并假设它将始终被设置。
根据您要实现的目标,您可以执行几项操作。这样的例子之一。
def extract_data(filename):
...
if len(matches) > 0:
date=matches[1]
strdate = str(date)
else:
print 'No dates found in {}'.format(filename)
strdate = ''
答案 2 :(得分:0)
看起来您仅在len(matches)> 0谓词为true时才分配strdate,请尝试在开头或else子句中添加默认strdate值以进行调试。
您似乎正在尝试调用strdate,但是由于if条件不成立,因此代码不知道strdate是什么(由于if语句为false,因此尚未赋值)。
那是我的猜测。