Question

我试图在.CSV文件中解析一些字典a，在单独的.txt文件中使用两个列表，以便脚本知道它在寻找什么。我们的想法是在.CSV文件中找到一个与Word和IDNumber匹配的行，然后在匹配时拔出第三个变量。但是，代码运行速度非常慢。任何想法如何让它更有效率？

import csv

IDNumberList_filename = 'IDs.txt'
WordsOfInterest_filename = 'dictionary_WordsOfInterest.txt'
Dictionary_filename = 'dictionary_individualwords.csv'

WordsOfInterest_ReadIn = open(WordsOfInterest_filename).read().split('\n')
#IDNumberListtoRead = open(IDNumberList_filename).read().split('\n')

for CurrentIDNumber in open(IDNumberList_filename).readlines():
    for CurrentWord in open(WordsOfInterest_filename).readlines():
        FoundCurrent = 0

        with open(Dictionary_filename, newline='', encoding='utf-8') as csvfile:
            reader = csv.DictReader(csvfile)
            for row in reader:
                if ((row['IDNumber'] == CurrentIDNumber) and (row['Word'] == CurrentWord)):
                    FoundCurrent = 1
                    CurrentProportion= row['CurrentProportion']

            if FoundCurrent == 0:
                CurrentProportion=0
            else:
                CurrentProportion=1
                print('found')

Answer 1

首先，考虑将文件dictionary_individualwords.csv加载到内存中。我猜python字典是这种情况的正确数据结构。

Answer 2

当您使用.txt文件的readlines时，您已经使用它们构建了一个内存列表。您应该首先构建这些列表，它们只在csv文件中解析一次。类似的东西：

import csv

IDNumberList_filename = 'IDs.txt'
WordsOfInterest_filename = 'dictionary_WordsOfInterest.txt'
Dictionary_filename = 'dictionary_individualwords.csv'

WordsOfInterest_ReadIn = open(WordsOfInterest_filename).read().split('\n')
#IDNumberListtoRead = open(IDNumberList_filename).read().split('\n')

numberlist = open(IDNumberList_filename).readlines():
wordlist =  open(WordsOfInterest_filename).readlines():

FoundCurrent = 0

with open(Dictionary_filename, newline='', encoding='utf-8') as csvfile:
    reader = csv.DictReader(csvfile)
    for row in reader:
        for CurrentIDNumber in numberlist:
            for CurrentWord in wordlist :

                if ((row['IDNumber'] == CurrentIDNumber) and (row['Word'] == CurrentWord)):
                    FoundCurrent = 1
                    CurrentProportion= row['CurrentProportion']

                if FoundCurrent == 0:
                    CurrentProportion=0
                else:
                    CurrentProportion=1
                    print('found')

小心：未经测试

Answer 3

您正在N次。如果文件不是太大，您可以通过将其内容保存为 dictionary或list of lists来避免这种情况。

每次从N = (# lines in IDS.txt) * (# lines in dictionary_WordsOfInterest.txt)

读取新行时，都会以同样的方式打开dictionary_WordsOfInterest.txt

此外，您似乎正在寻找可能来自txt文件的对（CurrentIDNumber，CurrentWord）的任意组合。例如，您可以将ID存储在一个集合中，将单词存储在另一个集合中，并且对于csv文件中的每一行，您可以检查id和单词是否都在各自的集合中。

更有效的方式来通过.csv文件？

3 个答案: