我在目录中有许多txt文件,我想要合并。以下是三个名为df_A,df_B和df_c的文件的示例:
df_A
0 1 2
0 James 1 yes
1 Jake 3 No
2 Jane 2 Yes
df_B
0 1 2
0 Jane 2 No
1 Job 6 No
2 James 1 Yes
df_C
0 1 2
0 Jack 4 No
1 Jenny 7 Yes
2 James 1 No
3 John 9 Yes
我希望最终的数据框看起来像这样:
ID Name df_A df_B df_C
1 James Yes Yes No
3 Jake No NA NA
2 Jane Yes No NA
6 Job NA Yes NA
4 Jack NA NA No
7 Jenny NA NA Yes
9 John NA NA Yes
这是我到目前为止的代码......
new_df = pd.DataFrame(columns = ['Name', 'ID'])
for filename in os.listdir('/path'):
if filename.endswith('.txt'):
course = os.path.splitext(filename)[0]
new_df = pd.concat([combined_df,pd.DataFrame(columns=[course])])
data = pd.read_csv(filename, sep="\t", header=None)
for i in data[data.columns[1]]:
if i not in new_df['ID']:
new_df['ID'].append(i)
答案 0 :(得分:3)
对于这三个数据帧,只需指定列名称即可。最后一列应该是唯一的。然后,为您的输出调用pd.concat
+ groupby
。
dfA.columns = ['Name', 'ID', 'df_A']
dfB.columns = ['Name', 'ID', 'df_B']
dfC.columns = ['Name', 'ID', 'df_C']
pd.concat([dfA, dfB, df3])\
.groupby('Name', as_index=False, sort=False).first()\
.set_index('ID').fillna('')
Name df_A df_B df_C
ID
1 James yes Yes No
3 Jake No
2 Jane Yes No
6 Job No
4 Jack No
7 Jenny Yes
9 John Yes
在一般情况下,假设您有df_list
。然后,您可以循环分配列名称。
df_list = [dfA, dfB, dfC, ...]
for i, df in enumerate(df_list):
df.columns = ['Name', 'ID', 'df_{}'.format(chr(ord('A') + i))]
pd.concat(df_list).groupby('Name',
as_index=False, sort=False).first().set_index('ID')
答案 1 :(得分:3)
let result = [{ fileName: "dog.jpg" },{ fileName: "cat.jpg"}];
async function getSignedUrl(key){
let params = { Bucket: bucketName, Key: key };
let url = await s3.getSignedUrl('getObject', params, (err, url) => {
if (err) reject(err)
return url;
});
}
async function process(items) {
for (let item of items) {
const signedUrl = await getSignedUrl(item.fileName);
item.url = signedUrl;
}
return items;
}
process(result).then(res => {
console.log(res);
});