我保证在我发表这篇文章之前,我已经搜索并阅读了Google的几页。我发誓已尽职调查。
我试图用python打开CSV文件,读取文件,对其进行更改,然后写出新文件。
我到这为止了
document.getElementById('btnn')
但是我正在努力争取进一步。我想删除某些列,但是我无法理解python如何知道行和列之间的区别。例如,列为import csv
def water_data ():
with open('aquastat.csv', 'r') as csv_file:
csv_reader = csv.reader(csv_file)
final_file_name = "final_water.data.csv"
final_file = open(final_file_name,'w')
csv_writer = csv.writer(final_file,delimiter="\t")
for row in csv_reader:
csv_writer.writerow(row)
,等等。我只想要Area, Area ID, Year, Value
。我尝试过
Area, Year, Value
但我一直收到以下错误:IndexError:列表索引超出范围
[我也想用*替换空白单元格,但列的优先级]
请注意,我不能使用熊猫
如果可能的话,如果有人不仅可以告诉我代码,还可以向我解释代码,我将不胜感激。
TLDR:如何从CVS文件中删除空行,并仅将某些列写入新文件?
输入:
for row in final_file:
final_file.writerow(row[0] + row[2] + row[4] + row[5])
答案 0 :(得分:1)
与您到目前为止相比,我已经尽力为您提供答案。
原型:
import csv
with open('aquastat.csv', 'r') as csv_file:
csv_reader = csv.reader(csv_file)
final_file_name = "final_water.data.csv"
final_file = open(final_file_name,'w')
csv_writer = csv.writer(final_file,delimiter="\t")
for row in csv_reader:
if len(row) >= 6:
row = [row[0], row[4], row[5]]
csv_writer.writerow(row)
final_file.close()
说明:
csv_writer.writerow(row)
行之前,在输出csv文件中输出该行。我添加了行row = [row[0], row[4], row[5]]
,在其中我用仅包含3个单元格的数组覆盖了数组row
的内容,这些单元格分别来自Area
,Year
, Value
列if len(row) >= 6:
,以检查您的行中至少有足够的元素来提取列,直到Value
。输入:
"Area","Area Id","Variable Name","Variable Id","Year","Value","Symbol","Md"
"Afghanistan",2,"Total area of the country",4100,1977,65286.0,"E","",""
"Afghanistan",2,"Total area of the country",4100,1982,65286.0,"E","",""
"Afghanistan",2,"Total area of the country",4100,1987,65286.0,"E","",""
"Afghanistan",2,"Total area of the country",4100,1992,65286.0,"E","",""
"Afghanistan",2,"Total area of the country",4100,1997,65286.0,"E","",""
"Afghanistan",2,"Total area of the country",4100,2002,65286.0,"E","",""
输出:
Area Year Value
Afghanistan 1977 65286.0
Afghanistan 1982 65286.0
Afghanistan 1987 65286.0
Afghanistan 1992 65286.0
Afghanistan 1997 65286.0
Afghanistan 2002 65286.0
答案 1 :(得分:0)
该行将不会IndexError
,并且将写该行而忽略不存在的值:
final_file.writerow((row[i] for i in (0,2,5) if i<len(row)))
该行将不会IndexError
,而是将其写为以星号代替空值的行:
final_file.writerow((row[i] if i<len(row) else "*" for i in (0,2,5)))
此行也不会IndexError
,但不会写该行:
if len(row)>5: final_file.writerow((row[i] for i in (0,2,5)))
此行也不会IndexError
,但根本不会写任何行:
pass
答案 2 :(得分:0)
您可以使用DictReader
and DictWriter
使用标题/列名来选择性地修改和写入特定列。
我将使用io.StringIO
模拟文件
s = '''"Area","Area Id","Variable Name","Variable Id","Year","Value","Symbol","Md"
"Afghanistan",2,"Total area of the country",4100,1977,65286.0,"E","",""
"Afghanistan",2,"Total area of the country",4100,1982,65286.0,"E","",""
"Afghanistan",2,"Total area of the country",4100,1987,65286.0,"E","",""
"Afghanistan",2,"Total area of the country",4100,1992,65286.0,"E","",""
"Afghanistan",2,"Total area of the country",4100,1997,65286.0,"E","",""
"Afghanistan",2,"Total area of the country",4100,2002,65286.0,"E","",""'''
f = io.StringIO(s)
g = io.StringIO()
reader = csv.DictReader(f)
writer = csv.DictWriter(g, fieldnames=["Area","Variable Id","Value"], extrasaction='ignore')
for row in reader:
#process row values?
row['Value'] = float(row['Value']) / 1000
writer.writerow(row)
请注意,由于原始文档中有 extra 个键/字段,因此DictWriter extrasaction
参数需要设置为'ignore'
。
如果csv文件没有标题行,则必须指定DictWriter的字段名称。
>>> g.seek(0)
0
>>> print(g.read())
Afghanistan,4100,65.286
Afghanistan,4100,65.286
Afghanistan,4100,65.286
Afghanistan,4100,65.286
Afghanistan,4100,65.286
Afghanistan,4100,65.286