我试图在.CSV文件中解析一些字典a,在单独的.txt文件中使用两个列表,以便脚本知道它在寻找什么。我们的想法是在.CSV文件中找到一个与Word和IDNumber匹配的行,然后在匹配时拔出第三个变量。但是,代码运行速度非常慢。任何想法如何让它更有效率?
import csv
IDNumberList_filename = 'IDs.txt'
WordsOfInterest_filename = 'dictionary_WordsOfInterest.txt'
Dictionary_filename = 'dictionary_individualwords.csv'
WordsOfInterest_ReadIn = open(WordsOfInterest_filename).read().split('\n')
#IDNumberListtoRead = open(IDNumberList_filename).read().split('\n')
for CurrentIDNumber in open(IDNumberList_filename).readlines():
for CurrentWord in open(WordsOfInterest_filename).readlines():
FoundCurrent = 0
with open(Dictionary_filename, newline='', encoding='utf-8') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
if ((row['IDNumber'] == CurrentIDNumber) and (row['Word'] == CurrentWord)):
FoundCurrent = 1
CurrentProportion= row['CurrentProportion']
if FoundCurrent == 0:
CurrentProportion=0
else:
CurrentProportion=1
print('found')
答案 0 :(得分:2)
首先,考虑将文件dictionary_individualwords.csv加载到内存中。我猜python字典是这种情况的正确数据结构。
答案 1 :(得分:1)
当您使用.txt文件的readlines时,您已经使用它们构建了一个内存列表。您应该首先构建这些列表,它们只在csv文件中解析一次。类似的东西:
import csv
IDNumberList_filename = 'IDs.txt'
WordsOfInterest_filename = 'dictionary_WordsOfInterest.txt'
Dictionary_filename = 'dictionary_individualwords.csv'
WordsOfInterest_ReadIn = open(WordsOfInterest_filename).read().split('\n')
#IDNumberListtoRead = open(IDNumberList_filename).read().split('\n')
numberlist = open(IDNumberList_filename).readlines():
wordlist = open(WordsOfInterest_filename).readlines():
FoundCurrent = 0
with open(Dictionary_filename, newline='', encoding='utf-8') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
for CurrentIDNumber in numberlist:
for CurrentWord in wordlist :
if ((row['IDNumber'] == CurrentIDNumber) and (row['Word'] == CurrentWord)):
FoundCurrent = 1
CurrentProportion= row['CurrentProportion']
if FoundCurrent == 0:
CurrentProportion=0
else:
CurrentProportion=1
print('found')
小心:未经测试
答案 2 :(得分:1)
您正在N次。如果文件不是太大,您可以通过将其内容保存为dictionary或list of lists来避免这种情况。
每次从N = (# lines in IDS.txt) * (# lines in dictionary_WordsOfInterest.txt)
dictionary_WordsOfInterest.txt
此外,您似乎正在寻找可能来自txt文件的对(CurrentIDNumber,CurrentWord)的任意组合。例如,您可以将ID存储在一个集合中,将单词存储在另一个集合中,并且对于csv文件中的每一行,您可以检查id和单词是否都在各自的集合中。