在python CSV模块中删除行和列

时间:2019-02-15 03:37:27

标签: python csv

我保证在我发表这篇文章之前,我已经搜索并阅读了Google的几页。我发誓已尽职调查。

我试图用python打开CSV文件,读取文件,对其进行更改,然后写出新文件。

我到这为止了

document.getElementById('btnn')

但是我正在努力争取进一步。我想删除某些列,但是我无法理解python如何知道行和列之间的区别。例如,列为import csv def water_data (): with open('aquastat.csv', 'r') as csv_file: csv_reader = csv.reader(csv_file) final_file_name = "final_water.data.csv" final_file = open(final_file_name,'w') csv_writer = csv.writer(final_file,delimiter="\t") for row in csv_reader: csv_writer.writerow(row) ,等等。我只想要Area, Area ID, Year, Value。我尝试过

Area, Year, Value

但我一直收到以下错误:IndexError:列表索引超出范围

[我也想用*替换空白单元格,但列的优先级]

请注意,我不能使用熊猫

如果可能的话,如果有人不仅可以告诉我代码,还可以向我解释代码,我将不胜感激。

TLDR:如何从CVS文件中删除空行,并仅将某些列写入新文件?

输入:

for row in final_file:

final_file.writerow(row[0] + row[2] + row[4] + row[5])

3 个答案:

答案 0 :(得分:1)

与您到目前为止相比,我已经尽力为您提供答案。

原型:

import csv

with open('aquastat.csv', 'r') as csv_file:
  csv_reader = csv.reader(csv_file)
  final_file_name = "final_water.data.csv"
  final_file = open(final_file_name,'w')
  csv_writer = csv.writer(final_file,delimiter="\t")
  for row in csv_reader:
    if len(row) >= 6:
        row = [row[0], row[4], row[5]]
        csv_writer.writerow(row)
  final_file.close()

说明:

  • csv_writer.writerow(row)行之前,在输出csv文件中输出该行。我添加了行row = [row[0], row[4], row[5]],在其中我用仅包含3个单元格的数组覆盖了数组row的内容,这些单元格分别来自AreaYearValue
  • 在此之上,我添加了一个if条件if len(row) >= 6:,以检查您的行中至少有足够的元素来提取列,直到Value

输入:

"Area","Area Id","Variable Name","Variable Id","Year","Value","Symbol","Md"
"Afghanistan",2,"Total area of the country",4100,1977,65286.0,"E","",""
"Afghanistan",2,"Total area of the country",4100,1982,65286.0,"E","",""
"Afghanistan",2,"Total area of the country",4100,1987,65286.0,"E","",""
"Afghanistan",2,"Total area of the country",4100,1992,65286.0,"E","",""
"Afghanistan",2,"Total area of the country",4100,1997,65286.0,"E","",""
"Afghanistan",2,"Total area of the country",4100,2002,65286.0,"E","",""

输出:

Area    Year    Value
Afghanistan     1977    65286.0
Afghanistan     1982    65286.0
Afghanistan     1987    65286.0
Afghanistan     1992    65286.0
Afghanistan     1997    65286.0
Afghanistan     2002    65286.0

答案 1 :(得分:0)

该行将不会IndexError,并且将写该行而忽略不存在的值:

final_file.writerow((row[i] for i in (0,2,5) if i<len(row)))

该行将不会IndexError,而是将其写为以星号代替空值的行:

final_file.writerow((row[i] if i<len(row) else "*" for i in (0,2,5)))

此行也不会IndexError,但不会写该行:

if len(row)>5: final_file.writerow((row[i] for i in (0,2,5)))

此行也不会IndexError,但根本不会写任何行:

pass

答案 2 :(得分:0)

您可以使用DictReader and DictWriter使用标题/列名来选择性地修改和写入特定列。

我将使用io.StringIO模拟文件

s = '''"Area","Area Id","Variable Name","Variable Id","Year","Value","Symbol","Md" 
"Afghanistan",2,"Total area of the country",4100,1977,65286.0,"E","",""
"Afghanistan",2,"Total area of the country",4100,1982,65286.0,"E","","" 
"Afghanistan",2,"Total area of the country",4100,1987,65286.0,"E","",""
"Afghanistan",2,"Total area of the country",4100,1992,65286.0,"E","","" 
"Afghanistan",2,"Total area of the country",4100,1997,65286.0,"E","",""
"Afghanistan",2,"Total area of the country",4100,2002,65286.0,"E","",""'''

f = io.StringIO(s)
g = io.StringIO()

reader = csv.DictReader(f)
writer = csv.DictWriter(g, fieldnames=["Area","Variable Id","Value"], extrasaction='ignore')

for row in reader:
    #process row values?
    row['Value'] = float(row['Value']) / 1000
    writer.writerow(row)

请注意,由于原始文档中有 extra 个键/字段,因此DictWriter extrasaction参数需要设置为'ignore'

如果csv文件没有标题行,则必须指定DictWriter的字段名称。


>>> g.seek(0)
0
>>> print(g.read())
Afghanistan,4100,65.286
Afghanistan,4100,65.286
Afghanistan,4100,65.286
Afghanistan,4100,65.286
Afghanistan,4100,65.286
Afghanistan,4100,65.286