我有一个与下面列出的数据帧相似的数据帧。由于某种原因,每个团队被列出两次,每个列对应一个列表。
import pandas as pd
import numpy as np
d = {'Team': ['1', '2', '3', '1', '2', '3'], 'Points for': [5, 10, 15, np.nan,np.nan,np.nan], 'Points against' : [np.nan,np.nan,np.nan, 3, 6, 9]}
df = pd.DataFrame(data=d)
Team Points for Points against
0 1 5 Nan
1 2 10 Nan
2 3 15 Nan
3 1 Nan 3
4 2 Nan 6
5 3 Nan 9
如何仅合并重复的团队名称行,以确保没有缺失值?这就是我想要的:
Team Points for Points against
0 1 5 3
1 2 10 6
2 3 15 9
我一直在尝试用熊猫弄清楚它,但似乎无法理解。谢谢!
答案 0 :(得分:1)
我对您的代码进行了更改,将字符串'Nan'替换为numpy的nan。
一种解决方案是melt数据,drop空条目和pivot从长到宽:
df = (df
.melt('Team')
.dropna()
.pivot('Team','variable','value')
.reset_index()
.rename_axis(None,axis='columns')
.astype(int)
)
df
Team Points against Points for
0 1 3 5
1 2 6 10
2 3 9 15
答案 1 :(得分:0)
使用groupby
的一种方法。 :
df = df.replace("Nan", np.nan)
new_df = df.groupby("Team").first()
print(new_df)
输出:
Points for Points against
Team
1 5.0 3.0
2 10.0 6.0
3 15.0 9.0
答案 2 :(得分:0)
您需要groupby
唯一标识符。如果还有游戏ID或日期或类似名称,您可能还需要对其进行分组。
df.groupby('Team').agg({'Points for': 'max', 'Points against': 'max'})
答案 3 :(得分:0)
pd.pivot_table(df, values = ['Points for','Points against'],index=['Team'], aggfunc=np.sum)[['Points for','Points against']]
Points for Points against
Team
1 5.0 3.0
2 10.0 6.0
3 15.0 9.0