将多个CSV合并为一个DataFrame,文件名为列名

时间:2017-11-17 00:14:53

标签: python pandas

我在目录中有许多txt文件,我想要合并。以下是三个名为df_A,df_B和df_c的文件的示例:

df_A
       0  1    2
0  James  1  yes
1   Jake  3   No
2   Jane  2  Yes

df_B
       0  1    2
0   Jane  2   No
1    Job  6   No
2  James  1  Yes

df_C
       0  1    2
0   Jack  4   No
1  Jenny  7  Yes
2  James  1   No
3   John  9  Yes

我希望最终的数据框看起来像这样:

ID  Name    df_A    df_B    df_C

1   James   Yes     Yes     No
3   Jake    No      NA      NA
2   Jane    Yes     No      NA
6   Job     NA      Yes     NA
4   Jack    NA      NA      No
7   Jenny   NA      NA      Yes
9   John    NA      NA      Yes

这是我到目前为止的代码......

new_df = pd.DataFrame(columns = ['Name', 'ID'])

for filename in os.listdir('/path'):
    if filename.endswith('.txt'):
        course = os.path.splitext(filename)[0]

        new_df = pd.concat([combined_df,pd.DataFrame(columns=[course])])
        data = pd.read_csv(filename, sep="\t", header=None)

        for i in data[data.columns[1]]:
            if i not in new_df['ID']:
                new_df['ID'].append(i)

2 个答案:

答案 0 :(得分:3)

对于这三个数据帧,只需指定列名称即可。最后一列应该是唯一的。然后,为您的输出调用pd.concat + groupby

dfA.columns = ['Name', 'ID', 'df_A']
dfB.columns = ['Name', 'ID', 'df_B']
dfC.columns = ['Name', 'ID', 'df_C']

pd.concat([dfA, dfB, df3])\
      .groupby('Name', as_index=False, sort=False).first()\
      .set_index('ID').fillna('')

     Name df_A df_B df_C
ID                      
1   James  yes  Yes   No
3    Jake   No          
2    Jane  Yes   No     
6     Job        No     
4    Jack             No
7   Jenny            Yes
9    John            Yes

在一般情况下,假设您有df_list。然后,您可以循环分配列名称。

df_list = [dfA, dfB, dfC, ...]
for i, df in enumerate(df_list):
    df.columns = ['Name', 'ID', 'df_{}'.format(chr(ord('A') + i))]

pd.concat(df_list).groupby('Name', 
        as_index=False, sort=False).first().set_index('ID')

答案 1 :(得分:3)

let result = [{ fileName: "dog.jpg" },{ fileName: "cat.jpg"}];

async function getSignedUrl(key){
    let params = { Bucket: bucketName, Key: key };
    let url = await s3.getSignedUrl('getObject', params, (err, url) => {
      if (err) reject(err)
      return url;
    });
}

async function process(items) {
  for (let item of items) {
    const signedUrl = await getSignedUrl(item.fileName);
    item.url = signedUrl;
  }
  return items;
}


process(result).then(res => {
  console.log(res);
});