pandas - 通过列表添加两个或更多不同DataFrame的值

时间:2018-05-16 05:35:11

标签: python pandas dataframe merge add

我希望通过列表在三个或更多DataFrame之间添加值,而不是逐个添加。

首先,我将使用merge作为示例。

以下行逐个合并DataFrames(data0data1data2):

final_data = data0.merge(data1, on=['player_id', 'player_name'])
final_data = final_data.merge(data2, on=['player_id', 'player_name'])

但是,相反,我可以通过列表合并DataFrames,这在处理更多DF时​​非常有用,例如:

data_list = [data0, data1, data2]
final_data = reduce(lambda left, right: pd.merge(left, right, on=['player_id', 'player_name']), data_list)

现在,我有以下三个DataFrame,我想在它们之间添加值。

data0

    player_id  player_name  ab  run  hit
0       28920     S. Smith   0    0    0
1       33351   T. Mancini   0    0    0
2       30267    C. Gentry   0    0    0
3       28513     A. Jones   0    0    0
4       31097   M. Machado   0    0    0
5       29170     C. Davis   0    0    0
6       29322    M. Trumbo   0    0    0
7       29564  W. Castillo   0    0    0
8       34885       H. Kim   0    0    0
9       32952   J. Rickard   0    0    0
10      31988    J. Schoop   0    0    0
11       5908   J.J. Hardy   0    0    0

接下来,

data1

   player_id player_name  ab  run  hit
0      28920    S. Smith   1    4    6
1      33351  T. Mancini   0    0    2
2      28513    A. Jones   2    1    0
3      31097  M. Machado   1    8    0
4      34885      H. Kim   1    1    2
5      32952  J. Rickard   0    2    0
6      31988   J. Schoop   5    3    4
7       5908  J.J. Hardy   4    2   10

接下来,

data2

   player_id player_name  ab  run  hit
0      28920    S. Smith   1    9    2
1      31097  M. Machado   3    3    3
2      29170    C. Davis   9    6    4
3      29322   M. Trumbo   3    5    7
4      32952  J. Rickard   1    3    4
5       5908  J.J. Hardy   0    0    5

我希望获得的最终DataFrame看起来像这样:

final_data

    player_id  player_name  ab  run  hit
0       28920     S. Smith   2   13    8
1       33351   T. Mancini   0    0    2
2       30267    C. Gentry   0    0    0
3       28513     A. Jones   2    1    0
4       31097   M. Machado   4   11    3
5       29170     C. Davis   9    6    4
6       29322    M. Trumbo   3    5    7
7       29564  W. Castillo   0    0    0
8       34885       H. Kim   1    1    2
9       32952   J. Rickard   1    5    4
10      31988    J. Schoop   5    3    4
11       5908   J.J. Hardy   4    2   15

我可以通过以下代码获得结果,但这会逐个添加DataFrame。

data0 = pd.read_csv('initial_df.csv')
data1 = pd.read_csv('add_vals1.csv')
data2 = pd.read_csv('add_vals2.csv')


data0 = data0.set_index(['player_id', 'player_name'])
data1 = data1.set_index(['player_id', 'player_name'])
data2 = data2.set_index(['player_id', 'player_name'])

final_data = data0.add(data1, fill_value=0).astype(int).reset_index()
final_data = final_data.set_index(['player_id', 'player_name'])
final_data = final_data.add(data2, fill_value=0).astype(int).reset_index()

有没有人可以帮助我通过列表获得最终结果,就像我在顶部使用合并功能一样?非常感谢你!

1 个答案:

答案 0 :(得分:1)

我认为需要在index_col中使用参数MultiIndex,然后在add使用read_csv

reduce
from functools import reduce

data0 = pd.read_csv('initial_df.csv', index_col=['player_id', 'player_name'])
data1 = pd.read_csv('add_vals1.csv', index_col=['player_id', 'player_name'])
data2 = pd.read_csv('add_vals2.csv', index_col=['player_id', 'player_name'])

两个级别concatsum的另一种解决方案:

data_list = [data0, data1, data2]
final_data = reduce(lambda x, y: x.add(y, fill_value=0), data_list).reset_index()
print (final_data)
    player_id  player_name   ab   run   hit
0        5908   J.J. Hardy  4.0   2.0  15.0
1       28513     A. Jones  2.0   1.0   0.0
2       28920     S. Smith  2.0  13.0   8.0
3       29170     C. Davis  9.0   6.0   4.0
4       29322    M. Trumbo  3.0   5.0   7.0
5       29564  W. Castillo  0.0   0.0   0.0
6       30267    C. Gentry  0.0   0.0   0.0
7       31097   M. Machado  4.0  11.0   3.0
8       31988    J. Schoop  5.0   3.0   4.0
9       32952   J. Rickard  1.0   5.0   4.0
10      33351   T. Mancini  0.0   0.0   2.0
11      34885       H. Kim  1.0   1.0   2.0