我有两个文件:
首先调用文件1:date,name,age
第二个名为文件2:date,name,age
以下是一个例子:
file1.csv:
2015/1/2,Jina,17
2015/1/3,JJ,25
2015/1/4,Carole,8
file2.csv:
2015/1/1,Rouba,14
2015/1/2,GG,78
2015/1/3,James,7
2015/1/4,Elie,15
我需要以相同的日期加入这两个文件 对于此示例,输出应为:
filex.txt:
2015/1/1,Rouba,14
2015/1/2,GG,78,Jina,17
2015/1/3,James,7,JJ,25
2015/1/4,Elie,15,Carole,8
任何帮助?
答案 0 :(得分:1)
<强> file1.csv 强>:
2015/1/2,Jina,17
2015/1/3,JJ,25
2015/1/4,Carole,8
<强> file2.csv 强>:
2015/1/1,Rouba,14
2015/1/2,GG,78
2015/1/3,James,7
2015/1/4,Elie,15
您的解决方案:
import pandas as pd
df1 = pd.read_csv('file1.csv', names=["Name", "Age"], index_col=0,
header=-1)
df2 = pd.read_csv('file2.csv', names=["Name", "Age"], index_col=0,
header=-1)
df = pd.concat([df2, df1], axis=1)
df.to_csv('filex.csv', header=False)
<强> filex.csv 强>:
2015/1/1,,,Rouba,14
2015/1/2,GG,78,Jina,17
2015/1/3,James,7,JJ,25
2015/1/4,Elie,15,Carole,8
如果您想删除filex.csv
中的多个逗号:
import re
with open('filex.csv', 'r') as desc:
filex = re.sub(',+', ',', desc.read())
with open('filex.txt', 'w') as desc:
desc.write(filex)
<强> filex.txt 强>:
2015/1/1,Rouba,14,
2015/1/2,GG,78,Jina,17
2015/1/3,James,7,JJ,25
2015/1/4,Elie,15,Carole,8
答案 1 :(得分:0)
当你在 Linux 操作系统上时,这是一个使用 awk 工具的简短单行程序:
awk -F, 'NR==FNR{ a[$1]=$2 FS $3; next }{ if($1 in a) $0=$0 OFS a[$1] }1' file1 OFS=',' file2
输出:
2015/1/1,Rouba,14
2015/1/2,GG,78,Jina,17
2015/1/3,James,7,JJ,25
2015/1/4,Elie,15,Carole,8
答案 2 :(得分:0)
尝试:
awk -F, 'NR==FNR{a[$1]=$2 FS $3;next}{printf("%s%s\n",$0,a[$1]?","a[$1]:"");}' file1 file2 > filex
编辑:现在添加一种非单一形式的解决方案,但有解释。
awk -F, 'FNR==NR{ ###-F is to set field separator, FNR==NR condition will be TRUE when first Input_file will be read. file in this case.
a[$1]=$2 FS $3; ###creating an aray named a whose index is $1 and value is $2 FS $3, where FS is field seprator(space by default)
next ###next is awks built-in keyword which will skip all the next statements.
}
{
printf("%s%s\n",$0,a[$1]?","a[$1]:"") ###printing the value of $0(current line of file2) and checking if array a value with index 41 is present
} ###if that is present then print , and array a value with index $1 else print null.
' file1 file2 > filex ###mentioning file1 and mentioning file2 also here.
答案 3 :(得分:0)
不使用熊猫,但需要更长的解决方案:
import csv
file1_list = []
with open('file1', 'r') as file1:
reader = csv.reader(file1)
file1_list = [item for item in reader]
file2_list = []
with open('file2', 'r') as file2:
reader = csv.reader(file2)
file2_list = [item for item in reader]
for item in file1_list:
print(item[0])
result = []
for item_1 in file1_list:
for item_2 in file2_list:
if item_1[0] == item_2[0]:
item_1.extend(item_2[1:])
result.append(item_1)
for item_1 in file2_list:
flag = True
for item_2 in result:
if item_1[0] == item_2[0]:
flag = False
if flag:
result.append(item_1)
for item_1 in file1_list:
flag = True
for item_2 in result:
if item_1[0] == item_2[0]:
flag = False
if flag:
result.append(item_1)
print(result)