如何删除多个.csv文件中具有空列值的行

时间:2019-12-19 09:11:38

标签: python csv

我正在尝试删除多个包含空单元格的.csv文件中的每一行。例如:

Data 1, Data 2, Data 3, Data 4
Value 1, Value 2, Value 3, Value 4
<empty cell>, Value 2, Value 3, Value 4      #Trying to remove this whole row
<empty cell>, Value 2, Value 3, Value 4      #Trying to remove this whole row
Value 1, Value 2, Value 3, Value 4

这是我到目前为止得到的:

import os
import csv
import argparse

ap = argparse.ArgumentParser()
ap.add_argument("-i", "--input", required=True)
ap.add_argument("-o", "--output", required=True)
args = vars(ap.parse_args())

for file in os.listdir(args["input"]):
    if file.endswith(".csv"):
        with open(os.path.join(args["input"],file), 'r') as infile, open(os.path.join(args["output"], file), 'w') as outfile:
            csv_reader = csv.reader(infile)
            for line in csv_reader:                                                 ///This is where I get stuck
                    with open(os.path.join(args["output"], file), 'a') as outfile:  

        outfile.close()

有什么想法吗?谢谢

3 个答案:

答案 0 :(得分:1)

您可以使用python库 pandas 将CSV作为数据框进行操作

输入文件'test_file.csv':

     A  B  C   D
0  1.0  3  6   9
1  NaN  4  7  10
2  2.0  5  8  11

然后:

import pandas as pd
f = 'test_file.csv'
df = pd.read_csv(f, sep=";")

vector_not_null = df['A'].notnull()
df_not_null = df[vector_not_null]


df_not_null.to_csv ('test_file_without_null_rows.csv', index = None, header=True, sep=';', encoding='utf-8-sig')

输出文件'test_file_without_null_rows.csv':

     A  B  C   D
0  1.0  3  6   9
1  2.0  5  8  11

答案 1 :(得分:1)

您可以在阅读时直接杀死具有空单元格的任何行:

df = pd.read_csv(myfile, sep=',').dropna()

答案 2 :(得分:1)

csv阅读器将空单元格表示为空字符串。空字符串在Python中的布尔值为False,因此您可以使用内置函数all来测试该行是否包含任何空单元格,以及是否应将其包含在输出中。 / p>

for file in os.listdir(args["input"]):
    if file.endswith(".csv"):
        with open(os.path.join(args["input"],file), 'r') as infile, open(os.path.join(args["output"], file), 'w') as outfile:
            csv_reader = csv.reader(infile)
            csv_writer = csv.writer(outfile)
            for line in csv_reader:
                if all(line):
                    csv_writer.writerow(line)