删除列+使用python在多个大型.csv文件中保留某些行

时间:2016-10-20 23:36:42

标签: python csv

你好,我在这里以及蟒蛇世界里真的很新。

我有一些(~1000).csv文件,每个文件包含~1800000行信息​​。文件格式如下:

5302730,131841,-0.29999999999999999,NULL,2013-12-31 22:00:46.773
5303072,188420,28.199999999999999,NULL,2013-12-31 22:27:46.863
5350066,131841,0.29999999999999999,NULL,2014-01-01 00:37:21.023
5385220,-268368577,4.5,NULL,2014-01-01 03:12:14.163
5305752,-268368587,5.1900000000000004,NULL,2014-01-01 03:11:55.207

所以,我想要所有的文件: (1)删除第4个(NULL)列 (2)在每个文件中只保留某些行(取决于第一列的值,即5302730,只保留包含该值的行)

我不知道这是否可能,所以任何答案都表示赞赏!

提前致谢。

2 个答案:

答案 0 :(得分:0)

查看csv module

可以使用csv.reader函数生成行的迭代器,每行以单元格作为列表。

for line in csv.reader(open("filename.csv")):
    # Remove 4th column, remember python starts counting at 0
    line = line[:3] + line[4:]
    if line[0] == "thevalueforthefirstcolumn":
         dosomethingwith(line)

答案 1 :(得分:0)

如果您希望多次对CSV文件执行此类操作,并希望使用有关要跳过的列的不同参数,要用作键的列以及要过滤的内容,可以使用以下内容:

import csv

def read_csv(filename, column_to_skip=None, key_column=0, key_filter=None):

    data_from_csv = []

    with open(filename) as csvfile:
        csv_reader = csv.reader(csvfile)

        for row in csv_reader:

            # Skip data in specific column
            if column_to_skip is not None:
                del row[column_to_skip]

            # Filter out rows where the key doesn't match
            if key_filter is not None:
                key = row[key_column]
                if key_filter != key:
                    continue

            data_from_csv.append(row)

    return data_from_csv

def write_csv(filename, data_to_write):

    with open(filename, 'w') as csvfile:
        csv_writer = csv.writer(csvfile)

        for row in data_to_write:
            csv_writer.writerow(row)

data = read_csv('data.csv', column_to_skip=3, key_filter='5302730')
write_csv('data2.csv', data)