我有2个csv文件,我通过python生成。记录如下(a.csv和b.csv)。 b.csv有2行,第二行的值可以是重复的。我希望得到像final.csv
这样的结果。我怎样才能做到这一点?
我试过下面的代码,但那不对。我没有做正确的比较。任何帮助都会很棒。
a.csv
"all","1","1Gi","4","8Gi"
"als","0","0","100m","128Mi"
"awx","6","9Gi","20","32Gi"
"cho-1","9","9728Mi","15","20Gi"
"cho-2","12250m","15395Mi","20","24Gi"
b.csv
"all","ABC"
"als","ABC"
"awx","DPL"
"cho-1","ABC"
"cho-2","ABC"
"cho-3","ABC"
我想创建一个文件,如下面的
final.csv
"all","1","1Gi","4","8Gi","ABC"
"als","0","0","100m","128Mi","ABC"
"awx","6","9Gi","20","32Gi","DPL"
"cho-1","9","9728Mi","15","20Gi","ABC"
"cho-2","12250m","15395Mi","20","24Gi","ABC"
我的代码:
csv1 = csv.reader(open("reports/a.csv", "r"))
csv2 = csv.reader(open("reports/b.csv", "r"))
s=[]
while True:
try:
line1 = csv1.next()
line2 = csv2.next()
if (line1[0] == line2[0]):
s.append([line1[1], line2[0], line2[1], line2[2], line2[3], line2[4]])
else:
s.append(["NA", line2[0], line2[1], line2[2], line2[3], line2[4]])
except StopIteration:
break
答案 0 :(得分:1)
在这种情况下我接受了大熊猫的帮助。
df0 = pd.read_csv("a.csv")
df1 = pd.read_csv("b.csv")
df1=df1.dropna(axis=1)
df1 = df1.merge(df0, on='Name', how='outer')
df1.to_csv("final.csv", index=True)
答案 1 :(得分:0)
从你的预期输出,我认为你应该使用set。由于line1和line2变量包含逗号分隔值,因此您可以根据这些值创建列表。像,
line1 = ["all","1","1Gi","4","8Gi"]
line2 = ["all","ABC"]
然后,您可以合并这两个列表以形成单个列表并从中创建一个集合。所以这个集合看起来像,
set1 = set(line1.extend(line2))
制作一个集合将删除重复项。 希望这会有所帮助。
答案 2 :(得分:0)
你离解决方案不远,你只需要将第2行的数据附加到第1行并使用它:
...
csvout = csv.writer(open("final.csv", "wb"), quoting = csv.QUOTE_ALL)
while True:
try:
line1 = csv1.next()
line2 = csv2.next()
if line1[0] != line2[0]: # control same first field
raise Exception("Desynch", line1[0], '#', line2[0])
line1.append(line2[1]) # append field from b.csv
csvout.writerow(line1) # and write it to final.csv
except StopIteration:
break