我想制作一个df,在df的末尾,有一些列可以从revious dfs获取数据。 我想检查其他dfs,如果第一列中的ID存在,如果存在,请检查它们出现在哪些列中,并将其添加到主df中的最后一列。
实施例
我的 MAIN DF 看起来像这样:
names col1 col2 col3 total
bbb V V X 2
ccc V X X 1
zzz X V V 2
qqq X V X 1
rrr X X V 1
例如我还有两个dfs(一般来说还有两个以上的dfs,所以我想在循环中运行所有这些dfs), DF1 :
names col1 col4 col5 total
bbb V V X 2
ccc V X X 1
yyy V V X 2
和 DF2 :
names col6 col2 col7 total
bbb V V X 2
ccc X X V 1
zzz X V V 2
所以我想更新 MAIN DF ,如下所示:
names col1 col2 col3 total total_col1 total_col2
bbb V V X 2 DF1 DF1
DF2 DF2
ccc V X X 1 DF1
zzz X V V 2 DF2
qqq X V X 1
rrr X X V 1
我希望大熊猫有可能这样做,而且这个例子很清楚
编辑通知列:在DF1
和DF2
中还有其他列不在原始主DF中,所以我只添加了列这也是最初的主要DF。
答案 0 :(得分:1)
您可以使用更通用的另一个answer。
首先创建list
的{{1}} dfs
并在列表理解过程中对其进行处理。然后concat
他们在一起并再次使用join
:
DataFrames
编辑:
没有df_names = ['DF1', 'DF2']
cols = ['col1','col2','col3']
dfs = [DF1, DF2]
dfs = [x.set_index('names')[cols]
.replace({'V':df_names[i], 'X':np.nan})
.add_prefix('total_') for i, x in enumerate(dfs)]
DF_ALL = pd.concat(dfs)
print (DF_ALL)
total_col1 total_col2 total_col3
names
bbb DF1 DF1 NaN
ccc DF1 NaN NaN
yyy DF1 DF1 NaN
bbb DF2 DF2 NaN
ccc NaN NaN DF2
zzz NaN DF2 DF2
df = df.join(DF_ALL, on='names')
print (df)
names col1 col2 col3 total total_col1 total_col2 total_col3
0 bbb V V X 2 DF1 DF1 NaN
0 bbb V V X 2 DF2 DF2 NaN
1 ccc V X X 1 DF1 NaN NaN
1 ccc V X X 1 NaN NaN DF2
2 zzz X V V 2 NaN DF2 DF2
3 qqq X V X 1 NaN NaN NaN
4 rrr X X V 1 NaN NaN NaN
列的解决方案:
names
EDIT1:
使用排除列的解决方案 - 如果缺少列,则使用drop
和df_names = ['DF1', 'DF2']
cols = ['col1','col2','col3']
dfs = [DF1, DF2]
dfs = [x[cols].replace({'V':df_names[i], 'X':np.nan})
.add_prefix('total_') for i, x in enumerate(dfs)]
DF_ALL = pd.concat(dfs).groupby(level=0).agg(lambda x: ', '.join(x.dropna().tolist()))
print (DF_ALL)
total_col1 total_col2 total_col3
bbb DF1, DF2 DF1, DF2
ccc DF1 DF2
yyy DF1 DF1
zzz DF2 DF2
df = pd.merge(df, DF_ALL, left_index=True, right_index=True, how='left')
df[DF_ALL.columns] = df[DF_ALL.columns].fillna('')
print (df)
col1 col2 col3 total total_col1 total_col2 total_col3
bbb V V X 2 DF1, DF2 DF1, DF2
ccc V X X 1 DF1 DF2
zzz X V V 2 DF2 DF2
qqq X V X 1
rrr X X V 1
以及list
,不会出现错误:
errors='ignore'
EDIT2:按intersection
添加了对列的过滤:
dfs = [DF1, DF2]
df_names = ['DF1', 'DF2']
exclude_cols = ['total','col_aaa']
dfs = [x.drop(exclude_cols, axis=1, errors='ignore')
.replace({'V':df_names[i], 'X':np.nan})
.add_prefix('total_') for i, x in enumerate(dfs)]
DF_ALL = pd.concat(dfs).groupby(level=0).agg(lambda x: ', '.join(x.dropna().tolist()))
print (DF_ALL)
total_col1 total_col2 total_col3
bbb DF1, DF2 DF1, DF2
ccc DF1 DF2
yyy DF1 DF1
zzz DF2 DF2
df = pd.merge(df, DF_ALL, left_index=True, right_index=True, how='left')
df[DF_ALL.columns] = df[DF_ALL.columns].fillna('')
print (df)
col1 col2 col3 total total_col1 total_col2 total_col3
bbb V V X 2 DF1, DF2 DF1, DF2
ccc V X X 1 DF1 DF2
zzz X V V 2 DF2 DF2
qqq X V X 1
rrr X X V 1
答案 1 :(得分:0)
它可以为你工作:
import pandas as pd
import numpy as np
MAIN_DF = [["bbb","V","V","X",2],
["ccc","V","X","X",1],
["zzz","X","V","V",2],
["qqq","X","V","X",1],
["rrr","X","X","V",1]]
MAIN_DF = pd.DataFrame(MAIN_DF, columns=["names", "col1","col2","col3","total"])
DF1 = [["bbb","V","V","X"],
["ccc","V","X","X"],
["yyy","V","V","X"]]
DF1 = pd.DataFrame(DF1, columns=["names", "col1","col2","col3"])
DF2 = [["bbb","V","V","X"],
["ccc","X","X","V"],
["zzz","X","V","V"]]
DF2 = pd.DataFrame(DF2, columns=["names", "col1","col2","col3"])
total_col = pd.DataFrame(data = np.zeros((MAIN_DF.shape[0],MAIN_DF.shape[1]-1)), columns=["names", "col1","col2","col3"])
total_col["names"]=MAIN_DF["names"]
for i in xrange(total_col.shape[0]):
name = total_col["names"][i]
for j in xrange(DF1.shape[0]):
if DF1["names"][j] == name:
for col in DF1.columns[1:]:
if DF1[col][j] == "V":
total_col[col][i] = "DF1"
for i in xrange(total_col.shape[0]):
name = total_col["names"][i]
for j in xrange(DF2.shape[0]):
if DF2["names"][j] == name:
for col in DF2.columns[1:]:
if DF2[col][j] == "V":
if total_col[col][i] == "DF1":
total_col[col][i] = "DF1 DF2"
else:
total_col[col][i] = "DF2"
names col1 col2 col3
0 bbb DF1 DF2 DF1 DF2 0
1 ccc DF1 0 DF2
2 zzz 0 DF2 DF2
3 qqq 0 0 0
4 rrr 0 0 0