我有一个简单的脚本要么从csv文件中删除最后n列,要么只在csv文件中保留前n列:
from sys import argv
import csv
if len(argv) == 4:
script, inputFile, outputFile, n = argv
n = [int(i) for i in n.split(",")]
else:
script, inputFile, outputFile = argv
n = 1
with open(inputFile,"r") as fin:
with open(outputFile,"w") as fout:
writer=csv.writer(fout)
for row in csv.reader(fin):
writer.writerow(row[:n])
示例用法(删除最后两列):removeKeepColumns.py sample.txt out.txt -2
如何扩展它以处理保留/删除特定列集的可能性,例如:
我可以将用逗号分隔的输入参数拆分为数组,但是不知道将其传递给writerow(row[])
我用来创建示例的脚本的链接:
答案 0 :(得分:4)
已经有一个已接受的答案,这是我的解决方案:
>>> import pyexcel as pe
>>> sheet = pe.get_sheet(file_name="your_file.csv")
>>> sheet.column.select([1,4,5]) # the column indices to keep
>>> sheet.save_as("your_filtered_file.csv")
>>> exit()
以下是filtering
的详细信息答案 1 :(得分:1)
阐述我的评论(Picking out items from a python list which have specific indexes)
from sys import argv
import csv
if len(argv) == 4:
script, inputFile, outputFile, cols_str = argv
cols = [int(i) for i in cols_str.split(",")]
with open(inputFile,"r") as fin:
with open(outputFile,"w") as fout:
writer=csv.writer(fout)
for row in csv.reader(fin):
sublist = [row[x] for x in cols]
writer.writerow(sublist)
这应该(未经测试)保留在第3个参数中以逗号分隔列表给出的所有列。要删除给定的列,
sublist = [row[x] for x not in cols]
应该这样做。