更有效的方式来通过.csv文件?

时间:2015-08-14 12:58:26

标签: python list csv python-3.x

我试图在.CSV文件中解析一些字典a,在单独的.txt文件中使用两个列表,以便脚本知道它在寻找什么。我们的想法是在.CSV文件中找到一个与Word和IDNumber匹配的行,然后在匹配时拔出第三个变量。但是,代码运行速度非常慢。任何想法如何让它更有效率?

import csv

IDNumberList_filename = 'IDs.txt'
WordsOfInterest_filename = 'dictionary_WordsOfInterest.txt'
Dictionary_filename = 'dictionary_individualwords.csv'

WordsOfInterest_ReadIn = open(WordsOfInterest_filename).read().split('\n')
#IDNumberListtoRead = open(IDNumberList_filename).read().split('\n')

for CurrentIDNumber in open(IDNumberList_filename).readlines():
    for CurrentWord in open(WordsOfInterest_filename).readlines():
        FoundCurrent = 0

        with open(Dictionary_filename, newline='', encoding='utf-8') as csvfile:
            reader = csv.DictReader(csvfile)
            for row in reader:
                if ((row['IDNumber'] == CurrentIDNumber) and (row['Word'] == CurrentWord)):
                    FoundCurrent = 1
                    CurrentProportion= row['CurrentProportion']

            if FoundCurrent == 0:
                CurrentProportion=0
            else:
                CurrentProportion=1
                print('found')

3 个答案:

答案 0 :(得分:2)

首先,考虑将文件dictionary_individualwords.csv加载到内存中。我猜python字典是这种情况的正确数据结构。

答案 1 :(得分:1)

当您使用.txt文件的readlines时,您已经使用它们构建了一个内存列表。您应该首先构建这些列表,它们只在csv文件中解析一次。类似的东西:

import csv

IDNumberList_filename = 'IDs.txt'
WordsOfInterest_filename = 'dictionary_WordsOfInterest.txt'
Dictionary_filename = 'dictionary_individualwords.csv'

WordsOfInterest_ReadIn = open(WordsOfInterest_filename).read().split('\n')
#IDNumberListtoRead = open(IDNumberList_filename).read().split('\n')

numberlist = open(IDNumberList_filename).readlines():
wordlist =  open(WordsOfInterest_filename).readlines():

FoundCurrent = 0

with open(Dictionary_filename, newline='', encoding='utf-8') as csvfile:
    reader = csv.DictReader(csvfile)
    for row in reader:
        for CurrentIDNumber in numberlist:
            for CurrentWord in wordlist :

                if ((row['IDNumber'] == CurrentIDNumber) and (row['Word'] == CurrentWord)):
                    FoundCurrent = 1
                    CurrentProportion= row['CurrentProportion']

                if FoundCurrent == 0:
                    CurrentProportion=0
                else:
                    CurrentProportion=1
                    print('found')

小心:未经测试

答案 2 :(得分:1)

您正在N次。如果文件不是太大,您可以通过将其内容保存为dictionarylist of lists来避免这种情况。

每次从N = (# lines in IDS.txt) * (# lines in dictionary_WordsOfInterest.txt)

读取新行时,都会以同样的方式打开dictionary_WordsOfInterest.txt

此外,您似乎正在寻找可能来自txt文件的对(CurrentIDNumber,CurrentWord)的任意组合。例如,您可以将ID存储在一个集合中,将单词存储在另一个集合中,并且对于csv文件中的每一行,您可以检查id和单词是否都在各自的集合中。