我希望通过列表在三个或更多DataFrame之间添加值,而不是逐个添加。
首先,我将使用merge作为示例。
以下行逐个合并DataFrames(data0
,data1
,data2
):
final_data = data0.merge(data1, on=['player_id', 'player_name'])
final_data = final_data.merge(data2, on=['player_id', 'player_name'])
但是,相反,我可以通过列表合并DataFrames,这在处理更多DF时非常有用,例如:
data_list = [data0, data1, data2]
final_data = reduce(lambda left, right: pd.merge(left, right, on=['player_id', 'player_name']), data_list)
现在,我有以下三个DataFrame,我想在它们之间添加值。
data0
:
player_id player_name ab run hit
0 28920 S. Smith 0 0 0
1 33351 T. Mancini 0 0 0
2 30267 C. Gentry 0 0 0
3 28513 A. Jones 0 0 0
4 31097 M. Machado 0 0 0
5 29170 C. Davis 0 0 0
6 29322 M. Trumbo 0 0 0
7 29564 W. Castillo 0 0 0
8 34885 H. Kim 0 0 0
9 32952 J. Rickard 0 0 0
10 31988 J. Schoop 0 0 0
11 5908 J.J. Hardy 0 0 0
接下来,
data1
:
player_id player_name ab run hit
0 28920 S. Smith 1 4 6
1 33351 T. Mancini 0 0 2
2 28513 A. Jones 2 1 0
3 31097 M. Machado 1 8 0
4 34885 H. Kim 1 1 2
5 32952 J. Rickard 0 2 0
6 31988 J. Schoop 5 3 4
7 5908 J.J. Hardy 4 2 10
接下来,
data2
:
player_id player_name ab run hit
0 28920 S. Smith 1 9 2
1 31097 M. Machado 3 3 3
2 29170 C. Davis 9 6 4
3 29322 M. Trumbo 3 5 7
4 32952 J. Rickard 1 3 4
5 5908 J.J. Hardy 0 0 5
我希望获得的最终DataFrame看起来像这样:
final_data
:
player_id player_name ab run hit
0 28920 S. Smith 2 13 8
1 33351 T. Mancini 0 0 2
2 30267 C. Gentry 0 0 0
3 28513 A. Jones 2 1 0
4 31097 M. Machado 4 11 3
5 29170 C. Davis 9 6 4
6 29322 M. Trumbo 3 5 7
7 29564 W. Castillo 0 0 0
8 34885 H. Kim 1 1 2
9 32952 J. Rickard 1 5 4
10 31988 J. Schoop 5 3 4
11 5908 J.J. Hardy 4 2 15
我可以通过以下代码获得结果,但这会逐个添加DataFrame。
data0 = pd.read_csv('initial_df.csv')
data1 = pd.read_csv('add_vals1.csv')
data2 = pd.read_csv('add_vals2.csv')
data0 = data0.set_index(['player_id', 'player_name'])
data1 = data1.set_index(['player_id', 'player_name'])
data2 = data2.set_index(['player_id', 'player_name'])
final_data = data0.add(data1, fill_value=0).astype(int).reset_index()
final_data = final_data.set_index(['player_id', 'player_name'])
final_data = final_data.add(data2, fill_value=0).astype(int).reset_index()
有没有人可以帮助我通过列表获得最终结果,就像我在顶部使用合并功能一样?非常感谢你!
答案 0 :(得分:1)
我认为需要在index_col
中使用参数MultiIndex
,然后在add
使用read_csv
:
reduce
from functools import reduce
data0 = pd.read_csv('initial_df.csv', index_col=['player_id', 'player_name'])
data1 = pd.read_csv('add_vals1.csv', index_col=['player_id', 'player_name'])
data2 = pd.read_csv('add_vals2.csv', index_col=['player_id', 'player_name'])
data_list = [data0, data1, data2]
final_data = reduce(lambda x, y: x.add(y, fill_value=0), data_list).reset_index()
print (final_data)
player_id player_name ab run hit
0 5908 J.J. Hardy 4.0 2.0 15.0
1 28513 A. Jones 2.0 1.0 0.0
2 28920 S. Smith 2.0 13.0 8.0
3 29170 C. Davis 9.0 6.0 4.0
4 29322 M. Trumbo 3.0 5.0 7.0
5 29564 W. Castillo 0.0 0.0 0.0
6 30267 C. Gentry 0.0 0.0 0.0
7 31097 M. Machado 4.0 11.0 3.0
8 31988 J. Schoop 5.0 3.0 4.0
9 32952 J. Rickard 1.0 5.0 4.0
10 33351 T. Mancini 0.0 0.0 2.0
11 34885 H. Kim 1.0 1.0 2.0