将csv文件(从文件夹)合并为一个,使用Python

时间:2017-03-15 00:03:32

标签: python csv pandas merge

我需要将位于文件夹中的多个CSV文件合并为一个文件。

我的原始数据是这样的

y_1980.csv:

     country   y_1980
0        afg    196
1        ago    125
2        alb     23
3          .      .
.          .      .

y_1981.csv:

     country   y_1981
0        afg    192
1        ago    120
2        alb     0
3          .      .
.          .      .

y_20xx.csv:

     country   y_20xx
0        afg    176
1        ago    170
2        alb     76
3          .      .
.          .      .

我期望获得的是类似的东西:

     country   y_1980   y_1981   ...   y_20xx    
0        afg      196      192   ...      176
1        ago      125      120   ...      170
2        alb       23        0   ...       76
3          .        .        .   ...        .
.          .        .        .   ...        .

到目前为止,我的当前代码如下,但我得到的结果是数据帧在前一个之后合并:

interesting_files = glob.glob("/Users/Desktop/Data/*.csv") 

header_saved = True

with open('/Users/Desktop/Data/table.csv','wb') as fout:
    for filename in interesting_files:

        with open(filename) as fin:
            header = next(fin)
            if not header_saved:
                fout.write(header)
                header_saved = True
            for line in fin:
                fout.write(line)

3 个答案:

答案 0 :(得分:1)

熊猫让这很容易。通过循环和合并,您可以简单地执行:

<强>代码:

import pandas as pd

files = ['file1', 'file2']
dfs = None
for filename in files:
    df = pd.read_csv(filename, sep='\s+')
    if dfs is None:
        dfs = df
    else:
        dfs = dfs.merge(df, how='outer')
    print(df)
print(dfs)
dfs.to_csv('file3', sep=' ')

<强>结果:

  country  y_1980
0     afg     196
1     ago     125
2     alb      23

  country  y_1981
0     afg     192
1     ago     120
2     alb       0

  country  y_1980  y_1981
0     afg     196     192
1     ago     125     120
2     alb      23       0

答案 1 :(得分:0)

代码的顺序似乎如下:

  • 打开文件#1
  • 如果未保存则写入标题
  • 写入数据行
  • 打开文件#2
  • ...等

将所有数据连接成一个文件。听起来你真的想加入专栏&#34; country&#34;代替

import glob
import pandas as pd
csvs = glob.glob("*.csv")
dfs = []

for csv in csvs:
  dfs.append(pd.read_csv(csv))

merged_df = dfs[0]

for df in dfs[1:]:
  merged_df = pd.merge(merged_df,df,on=['country'])


merged_df.to_csv('out.csv',index=False)

答案 2 :(得分:0)

如果你使用熊猫会容易得多。原因是它将摆脱for-loop问题并保持memory footprint低。

import pandas as pd

# read the files first

y_1980 = pd.read_csv('y_1980.csv', sep='\t')
y_1981 = pd.read_csv('y_1981.csv', sep='\t')

如果值使用&#39;按空格分隔,则可以更改sep选项。 &#39;或&#39;,&#39;逗号。

# set 'country' as the index to use this value to merge.
y_1980 = y_1980.set_index('country', append=True)
y_1981 = y_1981.set_index('country', append=True)

print(y_1980)
print(y_1981)

            y_1980
    country        
  0 afg         196
  1 ago         125
  2 alb          23


             y_1980
    country        
  0 afg         192
  1 ago         120
  2 alb           0

# set the frames to merge. You can add as many dataframe as you want.
frames =[y_1980, y_1981]

# now merge the dataframe
merged_df = pd.concat(frames, axis=1).reset_index(level=['country'])
print(result)

      country  y_1980  y_1980
0     afg     196     192
1     ago     125     120
2     alb      23       0

附加说明:如果您只想合并所有框架中的密钥,可以添加选项:how='inner' and drop=na。如果要合并所有帧中的所有可能数据,请使用how='outer'

有关详细信息,请参阅此链接:http://pandas.pydata.org/pandas-docs/stable/merging.html