我想在pandas数据框或面板中存储一些多维数据,以便我能够返回例如:
示例数据看起来像这样,请注意并非所有参赛者都拥有所有年份或所有比赛的数据。
有人可以用Pandas或其他方式建议一个好方法吗?
Name | Gender | Age
Runner A | Male | 35
Race A
Year | Time
2015 | 2:35:09
2014 | 2:47:34
2013 | 2:50:12
Race B
Year | Time
2013 | 1:32:07
Runner B | Male | 29
Race A
Year | Time
2015 | 3:05:56
Runner C | Female | 32
Race B
Year | Time
1998 | 1:29:43
答案 0 :(得分:4)
我认为您可以使用Multiindex
,然后按slicers选择数据:
import pandas as pd
df = pd.DataFrame({'Time': {('Runner A', 'Male', 35, 'Race A', 2014): '2:47:34', ('Runner C', 'Female', 32, 'Race B', 1998): '1:29:43', ('Runner B', 'Male', 29, 'Race A', 2015): '3:05:56', ('Runner A', 'Male', 35, 'Race A', 2013): '2:50:12', ('Runner A', 'Male', 35, 'Race B', 2013): '1:32:07', ('Runner A', 'Male', 35, 'Race A', 2015): '2:35:09'}})
print (df)
Time
Runner A Male 35 Race A 2013 2:50:12
2014 2:47:34
2015 2:35:09
Race B 2013 1:32:07
Runner B Male 29 Race A 2015 3:05:56
Runner C Female 32 Race B 1998 1:29:43
#index has to be fully lexsorted
df.sort_index(inplace=True)
print (df)
Time
Runner A Male 35 Race A 2013 2:50:12
2014 2:47:34
2015 2:35:09
Race B 2013 1:32:07
Runner B Male 29 Race A 2015 3:05:56
Runner C Female 32 Race B 1998 1:29:43
idx = pd.IndexSlice
print (df.loc[idx['Runner A',:,:,'Race A',:],:])
Time
Runner A Male 35 Race A 2013 2:50:12
2014 2:47:34
2015 2:35:09
print (df.loc[idx[:,:,:,'Race A',2015],:])
Time
Runner A Male 35 Race A 2015 2:35:09
Runner B Male 29 Race A 2015 3:05:56
答案 1 :(得分:1)
简单方法:
runners = pd.DataFrame(
[
['Bob', 'Male', 1980],
['Tom', 'Male', 1986],
['Amy', 'Female', 1966],
],
columns=['Name', 'Gender', 'BirthYear']
)
races = pd.DataFrame(
[
['A', 2015, 'Bob', '2:35:09'],
['A', 2014, 'Bob', '2:47:34'],
['A', 2013, 'Bob', '2:50:12'],
['B', 2013, 'Bob', '1:32:07'],
['A', 2015, 'Tom', '3:05:56'],
['B', 1998, 'Amy', '1:29:43'],
],
columns=['Race', 'Year', 'Name', 'Time']
)
print races.loc[(races.Name == 'Bob') & (races.Race == 'A')][['Time']]
print
print races.loc[(races.Year == 2015) & (races.Race == 'A')][['Name', 'Time']]
Time
0 2:35:09
1 2:47:34
2 2:50:12
Name Time
0 Bob 2:35:09
4 Tom 3:05:56
取回所有数据
df = races.merge(runners)
在比赛时间获得年龄
df['Age'] = df.Year - df.BirthYear
print df
Race Year Name Time Gender BirthYear Age
0 A 2015 Bob 2:35:09 Male 1980 35
1 A 2014 Bob 2:47:34 Male 1980 34
2 A 2013 Bob 2:50:12 Male 1980 33
3 B 2013 Bob 1:32:07 Male 1980 33
4 A 2015 Tom 3:05:56 Male 1986 29
5 B 1998 Amy 1:29:43 Female 1966 32