你好,我在这里以及蟒蛇世界里真的很新。
我有一些(~1000).csv文件,每个文件包含~1800000行信息。文件格式如下:
5302730,131841,-0.29999999999999999,NULL,2013-12-31 22:00:46.773
5303072,188420,28.199999999999999,NULL,2013-12-31 22:27:46.863
5350066,131841,0.29999999999999999,NULL,2014-01-01 00:37:21.023
5385220,-268368577,4.5,NULL,2014-01-01 03:12:14.163
5305752,-268368587,5.1900000000000004,NULL,2014-01-01 03:11:55.207
所以,我想要所有的文件: (1)删除第4个(NULL)列 (2)在每个文件中只保留某些行(取决于第一列的值,即5302730,只保留包含该值的行)
我不知道这是否可能,所以任何答案都表示赞赏!
提前致谢。
答案 0 :(得分:0)
可以使用csv.reader
函数生成行的迭代器,每行以单元格作为列表。
for line in csv.reader(open("filename.csv")):
# Remove 4th column, remember python starts counting at 0
line = line[:3] + line[4:]
if line[0] == "thevalueforthefirstcolumn":
dosomethingwith(line)
答案 1 :(得分:0)
如果您希望多次对CSV文件执行此类操作,并希望使用有关要跳过的列的不同参数,要用作键的列以及要过滤的内容,可以使用以下内容:
import csv
def read_csv(filename, column_to_skip=None, key_column=0, key_filter=None):
data_from_csv = []
with open(filename) as csvfile:
csv_reader = csv.reader(csvfile)
for row in csv_reader:
# Skip data in specific column
if column_to_skip is not None:
del row[column_to_skip]
# Filter out rows where the key doesn't match
if key_filter is not None:
key = row[key_column]
if key_filter != key:
continue
data_from_csv.append(row)
return data_from_csv
def write_csv(filename, data_to_write):
with open(filename, 'w') as csvfile:
csv_writer = csv.writer(csvfile)
for row in data_to_write:
csv_writer.writerow(row)
data = read_csv('data.csv', column_to_skip=3, key_filter='5302730')
write_csv('data2.csv', data)