我已经看过一些关于numpy模块等的相关帖子。我需要使用csv模块,它应该适用于此。虽然这里有很多关于使用csv模块的文章,但我找不到我想要的答案。非常感谢提前
基本上我有以下函数/伪代码(选项卡没有很好地复制...):
import csv
def copy(inname, outname):
infile = open(inname, "r")
outfile = open(outname, "w")
copying = False ##not copying yet
# if the first string up to the first whitespace in the "name" column of a row
# equals the first string up to the first whitespace in the "name" column of
# the row directly below it AND the value in the "ID" column of the first row
# does NOT equal the value in the "ID" column of the second row, copy these two
# rows in full to a new table.
例如,如果inname看起来像这样:
ID,NAME,YEAR, SPORTS_ALMANAC,NOTES
(前千行)
1001,New York Mets,1900,ESPN
1002,New York Yankees,1920,Guiness
1003,Boston Red Sox,1918,ESPN
1004,Washington Nationals,2010
(最后一行的最后一行)
1231231231235,Detroit Tigers,1990,ESPN
然后我希望我的输出看起来像:
ID,NAME,YEAR,SPORTS_ALMANAC,NOTES
1001,New York Mets,1900,ESPN
1002,New York Yankees,1920,Guiness
因为字符串“New”是直到“Name”列中第一个空格的相同字符串,并且ID不同。为了清楚起见,我需要代码尽可能具体化,因为“New”上的正则表达式不是我需要的,因为常见的第一个字符串可能是任何字符串。并且在第一个空白之后发生的事情并不重要(即“华盛顿国民”和“华盛顿特区”仍然应该给我一个打击,就像上面的纽约例子那样......)
我很困惑,因为在R中有一种方法可以做: inname $ name可以轻松搜索特定行中的值。我首先尝试在R中编写我的脚本,但它让人感到困惑。所以我想坚持使用Python。
答案 0 :(得分:2)
这是否符合您的要求(Python 3)?
import csv
def first_word(value):
return value.split(" ", 1)[0]
with open(inname, "r") as infile:
with open(outname, "w", newline="") as outfile:
in_csv = csv.reader(infile)
out_csv = csv.writer(outfile)
column_names = next(in_csv)
out_csv.writerow(column_names)
id_index = column_names.index("ID")
name_index = column_names.index("NAME")
try:
row_1 = next(in_csv)
written_row = False
for row_2 in in_csv:
if first_word(row_1[name_index]) == first_word(row_2[name_index]) and row_1[id_index] != row_2[id_index]:
if not written_row:
out_csv.writerow(row_1)
out_csv.writerow(row_2)
written_row = True
else:
written_row = False
row_1 = row_2
except StopIteration:
# No data rows!
pass
对于Python 2,请使用:
with open(outname, "w") as outfile:
in_csv = csv.reader(infile)
out_csv = csv.writer(outfile, lineterminator="\n")