我有一个大约500个字符串的列表,我想对照包含25,000行的CSV文件进行检查。我目前所拥有的似乎陷入循环。如果行包含字符串列表中的任何字符串,我基本上想跳过该行,然后提取其他数据。
stringList = [] #strings look like "AAA", "AAB", "AAC", etc.
with open('BadStrings.csv', 'r')as csvfile:
filereader = csv.reader(csvfile, delimiter=',')
for row in filereader:
stringToExclude = row[0]
stringList.append(stringToExclude)
with open('OtherData.csv', 'r')as csvfile:
filereader = csv.reader(csvfile, delimiter=',')
next(filereader, None) #Skip header row
for row in filereader:
for s in stringList:
if s not in row:
data1 = row[1]
编辑:不是无限循环,但是循环花费的时间太长。
答案 0 :(得分:0)
根据Niels,我将更改2循环并遍历该行本身,并检查当前行条目是否在“不良”列表中:
for row in filereader:
for s in row:
if s not in stringlist:
data1 = row[0]
我也不知道您想对data1做什么,但是当某项不在stringList中时,您总是会更改对象引用。
您可以使用列表通过data1.append(item)
答案 1 :(得分:0)
您可以尝试这样的事情。
stringList = [] #strings look like "AAA", "AAB", "AAC", etc.
with open('BadStrings.csv', 'r')as csvfile:
filereader = csv.reader(csvfile, delimiter=',')
for row in filereader:
stringToExclude = row[0]
stringList.append(stringToExclude)
data1 = [] # Right now you are overwriting your data1 every time. I don't know what you want to do with it, but you could for exmaple add all row[1] to a list data1
with open('OtherData.csv', 'r')as csvfile:
filereader = csv.reader(csvfile, delimiter=',')
next(filereader, None) #Skip header row
for row in filereader:
found_s = False
for s in stringList:
if s in row:
found_s = True
break
if not found_s:
data1.append(row[1]) # Add row[1] to the list is no element of stringList is found in row.
仍然可能不会有很大的性能改进,但是至少在找到s之后,for循环for s in stringList:
现在会停止。