Question

我在删除包含一列中字符串的文本文件中的行时遇到问题。到目前为止，我的代码无法删除该行，但它能够读取文本文件并将其作为CSV文件保存到单独的列中。但这些行不会被删除。

这就是该列中的值：

Ship To or Bill To
------------------
3000000092-BILL_TO
3000000092-SHIP_TO
3000004000_SHIP_TO-INAC-EIM

还有20多个列和50,000多个行。所以基本上我是要删除所有包含字符串'INAC'或'EIM'的行。

import csv

my_file_name = "NVG.txt"
cleaned_file = "cleanNVG.csv"
remove_words = ['INAC','EIM']

with open(my_file_name, 'r', newline='') as infile, \
     open(cleaned_file, 'w',newline='') as outfile:
    writer = csv.writer(outfile)
    for line in csv.reader(infile, delimiter='|'):
        if not any(remove_word in line for remove_word in remove_words):
            writer.writerow(line)

Answer 1

这里的问题是csv.reader对象将文件的行作为单个列值的列表返回，因此“in”测试检查该列表中的任何单个值是否等于remove_word。

快速解决方法是尝试

        if not any(remove_word in element
                      for element in line
                      for remove_word in remove_words):

因为如果该行中的任何字段包含任何remove_words。

，则为true

Answer 2

正如其他答案所指出的那样，代码无法正常工作的原因是因为每个line in csv.reader实际上都是列值列表，因此remove_word in line会检查其中是否存在任何列值完全等于remove_words之一 - 显然从不True。

如果您只需要检查一列中的单词，则没有理由检查所有单词。以下内容仅检查一列的值，因此应该比检查文件的每一行中的所有20个或更多列快得多。

import csv

my_file_name = "NVG.txt"
cleaned_file_name = "cleanNVG.csv"
ONE_COLUMN = 1
remove_words = ['INAC', 'EIM']

with open(my_file_name, 'r', newline='') as infile, \
     open(cleaned_file_name, 'w',newline='') as outfile:
    writer = csv.writer(outfile)
    for row in csv.reader(infile, delimiter='|'):
        column = row[ONE_COLUMN]
        if not any(remove_word in column for remove_word in remove_words):
            writer.writerow(row)

Answer 3

csv阅读器输出的每一行都是字符串列表，而不是字符串，因此列表理解是检查'INAC'或'EIM'是否是列表成员之一，即：

'INAC' in ['3000004000_SHIP_TO-INAC-EIM', ...]

总是假的，因为'in'在列表中调用时查找完全匹配。如果要检查字符串是否存在于行中的任何位置，则不需要csv阅读器，而是可以使用普通的open（）：

import csv

my_file_name = "NVG.txt"
cleaned_file = "cleanNVG.csv"
remove_words = ['INAC','EIM']

with open(my_file_name, 'r', newline='') as infile, open(cleaned_file, 'w',newline='') as outfile:
    writer = csv.writer(outfile)
    for line in infile:
        if not any(remove_word in line for remove_word in remove_words):
            writer.writerow(line)

如果行包含CSV文件中的字符串，则删除该行

3 个答案: