我需要将位于文件夹中的多个CSV文件合并为一个文件。
我的原始数据是这样的
y_1980.csv:
country y_1980
0 afg 196
1 ago 125
2 alb 23
3 . .
. . .
y_1981.csv:
country y_1981
0 afg 192
1 ago 120
2 alb 0
3 . .
. . .
y_20xx.csv:
country y_20xx
0 afg 176
1 ago 170
2 alb 76
3 . .
. . .
我期望获得的是类似的东西:
country y_1980 y_1981 ... y_20xx
0 afg 196 192 ... 176
1 ago 125 120 ... 170
2 alb 23 0 ... 76
3 . . . ... .
. . . . ... .
到目前为止,我的当前代码如下,但我得到的结果是数据帧在前一个之后合并:
interesting_files = glob.glob("/Users/Desktop/Data/*.csv")
header_saved = True
with open('/Users/Desktop/Data/table.csv','wb') as fout:
for filename in interesting_files:
with open(filename) as fin:
header = next(fin)
if not header_saved:
fout.write(header)
header_saved = True
for line in fin:
fout.write(line)
答案 0 :(得分:1)
熊猫让这很容易。通过循环和合并,您可以简单地执行:
<强>代码:强>
import pandas as pd
files = ['file1', 'file2']
dfs = None
for filename in files:
df = pd.read_csv(filename, sep='\s+')
if dfs is None:
dfs = df
else:
dfs = dfs.merge(df, how='outer')
print(df)
print(dfs)
dfs.to_csv('file3', sep=' ')
<强>结果:强>
country y_1980
0 afg 196
1 ago 125
2 alb 23
country y_1981
0 afg 192
1 ago 120
2 alb 0
country y_1980 y_1981
0 afg 196 192
1 ago 125 120
2 alb 23 0
答案 1 :(得分:0)
代码的顺序似乎如下:
将所有数据连接成一个文件。听起来你真的想加入专栏&#34; country&#34;代替
import glob
import pandas as pd
csvs = glob.glob("*.csv")
dfs = []
for csv in csvs:
dfs.append(pd.read_csv(csv))
merged_df = dfs[0]
for df in dfs[1:]:
merged_df = pd.merge(merged_df,df,on=['country'])
merged_df.to_csv('out.csv',index=False)
答案 2 :(得分:0)
如果你使用熊猫会容易得多。原因是它将摆脱for-loop
问题并保持memory footprint
低。
import pandas as pd
# read the files first
y_1980 = pd.read_csv('y_1980.csv', sep='\t')
y_1981 = pd.read_csv('y_1981.csv', sep='\t')
如果值使用&#39;按空格分隔,则可以更改sep
选项。 &#39;或&#39;,&#39;逗号。
# set 'country' as the index to use this value to merge.
y_1980 = y_1980.set_index('country', append=True)
y_1981 = y_1981.set_index('country', append=True)
print(y_1980)
print(y_1981)
y_1980
country
0 afg 196
1 ago 125
2 alb 23
y_1980
country
0 afg 192
1 ago 120
2 alb 0
# set the frames to merge. You can add as many dataframe as you want.
frames =[y_1980, y_1981]
# now merge the dataframe
merged_df = pd.concat(frames, axis=1).reset_index(level=['country'])
print(result)
country y_1980 y_1980
0 afg 196 192
1 ago 125 120
2 alb 23 0
附加说明:如果您只想合并所有框架中的密钥,可以添加选项:how='inner' and drop=na
。如果要合并所有帧中的所有可能数据,请使用how='outer'
。
有关详细信息,请参阅此链接:http://pandas.pydata.org/pandas-docs/stable/merging.html