我在python中有一个有很多行的对象:
INPUT:
Team1 Player1 idTrip13 133
Team2 Player333 idTrip10 18373
Team3 Player22 idTrip12 17338899
Team2 Player293 idTrip02 17656
Team3 Player20 idTrip11 1883
Team1 Player1 idTrip19 19393
我需要聚合这些数据(如数据透视表)。
OUTPUT我正在努力:
Team1 Player1 : 2 trips : sum(133+19393)
Team2 Player333 : 1 trip : 18373; Player293 : 1 trip : 17656
Team3 Player22 : 1 trip : 17338899; Player20 : 1 trip : 1883
有人可以建议使用Python中的相应对象,以便我可以使用以下输出吗?
print team, player, trips, time
答案 0 :(得分:8)
将您的数据放入列表列表中,每个内部列表都是数据框中的一行。
In[1]:
mydata = [['Team1', 'Player1', 'idTrip13', 133], ['Team2', 'Player333', 'idTrip10', 18373],
['Team3', 'Player22', 'idTrip12', 17338899], ['Team2', 'Player293','idTrip02', 17656],
['Team3', 'Player20', 'idTrip11', 1883], ['Team1', 'Player1', 'idTrip19', 19393]]
df = pd.DataFrame(mydata, columns = ['team', 'player', 'trips', 'time'])
df
Out[1]:
team player trips time
0 Team1 Player1 idTrip13 133
1 Team2 Player333 idTrip10 18373
2 Team3 Player22 idTrip12 17338899
3 Team2 Player293 idTrip02 17656
4 Team3 Player20 idTrip11 1883
5 Team1 Player1 idTrip19 19393
Call groupby()
,传递您希望用作石斑鱼的列,
并将功能应用于组。
实施例
<强>实施例。 1 查找每个团队进行的旅行次数。 team
是石斑鱼,我们在count()
列上应用['trips']
函数。
In[2]:
trip_count = df.groupby(by = ['team'])['trips'].count()
trip_count
Out[2]:
team
Team1 2
Team2 2
Team3 2
Name: trips, dtype: int64
<强>实施例。 2(多列):查找团队中每位玩家所花费的总时间。我们使用2列['team', 'player']
作为分组,并在sum()
列上应用函数['time']
。
In[3]:
trip_time = df.groupby(by = ['team', 'player'])['time'].sum()
trip_time
Out[3]:
team player
Team1 Player1 19526
Team2 Player293 17656
Player333 18373
Team3 Player20 1883
Player22 17338899
Name: time, dtype: int64
<强>实施例。 3 (multiple functions) :对于团队中的每位玩家,查找旅行总次数和旅行总时间。
player_total = df.groupby(by = ['team', 'player']).agg({'time' : 'sum', 'trips' : 'count'})
player_total
Out[4]:
trips time
team player
Team1 Player1 2 19526
Team2 Player293 1 17656
Player333 1 18373
Team3 Player20 1 1883
Player22 1 17338899