假设我们有以下csv文件
file1.csv
#groups id owner
abc id1 owner1
abc id2 owner1
bcx id1 owner2
cpa id3 owner1
以下脚本读取file1.csv
,在第一列#groups
上进行过滤,并添加多余的字符
#!/bin/env python2
#!/usr/bin/python
import re
import csv
print "enter Path to orignal file"
GROUPS = raw_input()
print "enter Path to modified file"
WORKING = raw_input()
def filter_lines(f):
"""this generator funtion uses a regular expression
to include only lines that have a `abc` at the start
and NO `gep` throughout the record
"""
filter_regex = r'^abc(?!gep).*'
for line in f:
line = line.strip()
m = re.match(filter_regex, line)
if m:
yield line
pat = re.compile(r'^(abc)(?!.*gep.*)') #insert gep in any abc records that dont have gep
#insert gep
variable1 = 0
with open(GROUPS, 'r') as f:
with open(WORKING, 'w') as data:
#next(f) # Skip over header in input file.
#filter
filter_generator = filter_lines(f)
csv_reader = csv.reader(filter_generator)
count = 0
writer = csv.writer(data) #, quoting=csv.QUOTE_ALL
for row in csv_reader:
count += 1
variable1 = (pat.sub('\\1gep_', row[0])) #modify all filtered records to include gep
fields = [variable1]
writer.writerow(fields)
print 'Filtered (abc at Start and NO gep) Rows Count = ' + str(count)
例如,abc
将变为abc_gep
,我们将其写入另一个csv文件file2.csv
因此 file2.csv 现在仅包含:
abc_gep
abc_gep
好。
现在我想添加与file1.csv
中的 abc 匹配的其余列我该怎么做?
我尝试了以下
fields = [variable1,row[1],row[2]]
但这是对列进行硬编码,而不是动态的。我正在寻找更像这样的东西:
fields = [variable1, row[i]]
本质上,这是我正在寻找 file2.csv 的结果:
abc_gep id1 owner1
abc_gep id2 owner1