问题中给出的特定格式的必需数据框

时间:2019-06-12 12:49:36

标签: pandas dataframe

必须将此格式的数据帧转换为如下所示的格式:      统计数据采用字典格式:

 [  Player  Stats]
 0 Sachin   {'Runs': 18000, 'Hundreds': 49, 'Avg': 45}
 1 Ganguly  {'Runs': 11000, 'Hundreds': 25, 'Avg': 40}
 2 Kohli    {'Runs': 11000, 'Hundreds': 41, 'Avg': 50,'Fifties': 50}

 Player Events  Values
 Sachin Runs    18000
 Sachin Hundreds 49
 Sachin Avg 15
 Ganguly    Runs    11000
 Ganguly    Hundreds 25
 Ganguly    Avg 40
 Kohli  Runs    11000
 Kohli  Hundreds 41
 Kohli  Avg 50
 Kohli  Fifties 50

1 个答案:

答案 0 :(得分:2)

通过列表理解创建元组列表,并将其传递给DataFrame构造函数:

L = [(x, a, b) for x, y in zip(df['Player'], df['Stats']) for a,b in y.items()]
df = pd.DataFrame(L, columns=['Player','Events','Values'])
print (df)
    Player    Events  Values
0   Sachin      Runs   18000
1   Sachin  Hundreds      49
2   Sachin       Avg      45
3  Ganguly      Runs   11000
4  Ganguly  Hundreds      25
5  Ganguly       Avg      40
6    Kohli      Runs   11000
7    Kohli  Hundreds      41
8    Kohli       Avg      50
9    Kohli   Fifties      50

另一种解决方案:

df = pd.DataFrame(df.pop('Stats').values.tolist(), index=df['Player']).stack().reset_index()
df.columns = ['Player','Events','Values']
print (df)
    Player    Events   Values
0   Sachin       Avg     45.0
1   Sachin  Hundreds     49.0
2   Sachin      Runs  18000.0
3  Ganguly       Avg     40.0
4  Ganguly  Hundreds     25.0
5  Ganguly      Runs  11000.0
6    Kohli       Avg     50.0
7    Kohli   Fifties     50.0
8    Kohli  Hundreds     41.0
9    Kohli      Runs  11000.0