代码(下面转载)读入文件,执行操作,并将原始文件的子集输出到新文件中。我如何稍微调整它,而是输出从初始文件到输出文件的所有内容,但添加一个“标志”列,值为“1”,其中行是当前输出的行(我们最感兴趣的行子集)?其他行(目前仅在输入文件中的行)将在新的“flag”列中显示空白或“0”。
这个问题经常发生在我身上,它会为我节省很多时间,只是为了有一般的方法来做到这一点。
非常感谢任何帮助!
import csv
inname = "aliases.csv"
outname = "output.csv"
def first_word(value):
return value.split(" ", 1)[0]
with open(inname, "r", encoding = "utf-8") as infile:
with open(outname, "w", encoding = "utf-8") as outfile:
in_csv = csv.reader(infile)
out_csv = csv.writer(outfile)
column_names = next(in_csv)
out_csv.writerow(column_names)
id_index = column_names.index("id")
name_index = column_names.index("name")
try:
row_1 = next(in_csv)
written_row = False
for row_2 in in_csv:
if first_word(row_1[name_index]) == first_word(row_2[name_index]) and row_1[id_index] != row_2[id_index]:
if not written_row:
out_csv.writerow(row_1)
out_csv.writerow(row_2)
written_row = True
else:
written_row = False
row_1 = row_2
except StopIteration:
# No data rows!
pass
答案 0 :(得分:0)
我在编写CSV时总是使用DictReader,主要是因为它更明确一些(这让我更容易:))。以下是您可以做的高度风格化的版本。我所做的更改包括:
csv.DictReader()
和csv.DictWriter()
代替csv.reader
和csv.writer
。这不同于使用字典来表示行而不是列表,这意味着行看起来像{'column_name': 'value', 'column_name_2': 'value2'}
。这意味着每一行都包含列标题数据,也可以像字典一样对待。name
和number
,然后在写作时,我做了一个简单的检查,看看number
值是否为> 2
考虑到这一点,这是一个例子:
import csv
input_csv = 'aliases.csv'
output_csv = 'output.csv'
def first_word(value):
return value.split(' ', 1)[0]
with open(input_csv, 'r') as infile:
# Specify the fieldnames in your aliases CSV
input_fields = ('name', 'number')
# Set up the DictReader, which will read the file into an iterable
# where each row is a {column_name: value} dictionary
reader = csv.DictReader(infile, fieldnames=input_fields)
# Now open the output file
with open(output_csv, 'w') as outfile:
# Define the new 'flag' field
output_fields = ('name', 'number', 'flag')
writer = csv.DictWriter(outfile, fieldnames=output_fields)
# Write the column names (this is a handy convention seen elsewhere on SO)
writer.writerow(dict((h, h) for h in output_fields))
# Skip the first row (which is the column headers) and then store the
# first row dictionary
next(reader)
first_row = next(reader)
# Now begin your iteration through the input, writing all fields as they
# appear, but using some logic to write the 'flag' field
# This is where the dictionary comes into play - 'row' is actually a
# dictionary, so you can use dictionary syntax to assign to it
for next_row in reader:
# Set up the variables for your comparison
first_name = first_word(first_row['name'])
next_name = first_word(next_row['name'])
first_id = first_row['number']
next_id = next_row['number']
# Compare the current row to the previous row
if first_name == next_name and first_id != next_id:
# Here we are adding an element to our row dictionary - 'flag'
first_row['flag'] = 'Y'
# Now we write the entire first_row dictionary to the row
writer.writerow(first_row)
# Change the reference, just like you did
first_row = next_row