我想提取组中每个数据帧的行并从中创建新的数据帧,以便新数据帧仅包含组的第一行,另一个新数据帧包含第二行,另一个用于第三行和等等。例如,我的数据框是:
raw_data = {'regiment': ['Nighthawks', 'Nighthawks', 'Nighthawks', 'Nighthawks', 'Dragoons', 'Dragoons', 'Dragoons', 'Dragoons', 'Scouts', 'Scouts', 'Scouts', 'Scouts'],
'name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze', 'Jacon', 'Ryaner', 'Sone', 'Sloan', 'Piger', 'Riani', 'Ali'],
'preTestScore': [4, 24, 31, 2, 3, 4, 24, 31, 2, 3, 2, 3],
'postTestScore': [25, 94, 57, 62, 70, 25, 94, 57, 62, 70, 62, 70]}
df = pd.DataFrame(raw_data, columns = ['regiment', 'name', 'preTestScore', 'postTestScore'])
df
regiment name preTestScore postTestScore
0 Nighthawks Miller 4 25
1 Nighthawks Jacobson 24 94
2 Nighthawks Ali 31 57
3 Nighthawks Milner 2 62
4 Dragoons Cooze 3 70
5 Dragoons Jacon 4 25
6 Dragoons Ryaner 24 94
7 Dragoons Sone 31 57
8 Scouts Sloan 2 62
9 Scouts Piger 3 70
10 Scouts Riani 2 62
11 Scouts Ali 3 70
我把它归为:
gb = df.groupby("regiment")
regiment name preTestScore postTestScore
8 Scouts Sloan 2 62
9 Scouts Piger 3 70
10 Scouts Riani 2 62
11 Scouts Ali 3 70
------------------
regiment name preTestScore postTestScore
0 Nighthawks Miller 4 25
1 Nighthawks Jacobson 24 94
2 Nighthawks Ali 31 57
3 Nighthawks Milner 2 62
------------------
regiment name preTestScore postTestScore
4 Dragoons Cooze 3 70
5 Dragoons Jacon 4 25
6 Dragoons Ryaner 24 94
7 Dragoons Sone 31 57
------------------
我想创建数据框,例如:
第一行的数据框:
regiment name preTestScore postTestScore
8 Scouts Sloan 2 62
0 Nighthawks Miller 4 25
4 Dragoons Cooze 3 70
第二行的数据框:
regiment name preTestScore postTestScore
9 Scouts Piger 3 70
1 Nighthawks Jacobson 24 94
5 Dragoons Jacon 4 25
等等。
我在考虑使用Group.apply(),但我不太确定。
非常感谢!
答案 0 :(得分:1)
字典当然是无序的。鉴于样本数据每个团只有四行,这里是前四名的排名,它使用... here you match for n and m ...
WITH n.valUpper as x1, n.valLower as y1, m.valUpper as x2, m.valLower as y2
RETURN apoc.coll.max([x2, y2]) - apoc.coll.min([x1, y1]) < (x2 - x1) + (y2 - y1)
上的nth
。结果是使用字典理解迭代通过范围四(0,1,2,3),获取此类值的groupby
行,并将值转换回其序号(例如0等于'第一') )。
nth
对于不同长度的行:
d = {n: ordinal for n, ordinal in zip(
range(5), ['first', 'second', 'third', 'fourth', 'fifth'])}
top_n = 4
>>> {d[n]: df.groupby(['regiment']).nth(n) for n in range(top_n)}
{'first': name postTestScore preTestScore
regiment
Dragoons Cooze 70 3
Nighthawks Miller 25 4
Scouts Sloan 62 2,
'fourth': name postTestScore preTestScore
regiment
Dragoons Sone 57 31
Nighthawks Milner 62 2
Scouts Ali 70 3,
'second': name postTestScore preTestScore
regiment
Dragoons Jacon 25 4
Nighthawks Jacobson 94 24
Scouts Piger 70 3,
'third': name postTestScore preTestScore
regiment
Dragoons Ryaner 94 24
Nighthawks Ali 57 31
Scouts Riani 62 2}
答案 1 :(得分:1)
df = df.iloc[1:-1, :] # Drop first and last row.
>>> {d[n]: df.groupby(['regiment']).nth(n).reindex(sorted(df.regiment.unique()))
for n in range(top_n)}
{'first': name postTestScore preTestScore
regiment
Dragoons Cooze 70 3
Nighthawks Jacobson 94 24
Scouts Sloan 62 2,
'fourth': name postTestScore preTestScore
regiment
Dragoons Sone 57 31
Nighthawks NaN NaN NaN
Scouts NaN NaN NaN,
'second': name postTestScore preTestScore
regiment
Dragoons Jacon 25 4
Nighthawks Ali 57 31
Scouts Piger 70 3,
'third': name postTestScore preTestScore
regiment
Dragoons Ryaner 94 24
Nighthawks Milner 62 2
Scouts Riani 62 2}
,请使用groupby
来存储
dicts
或In [67]: {x:g for x,g in df.sort_values(by='regiment',ascending=False).groupby(df.index%4)}
Out[67]:
{0: regiment name preTestScore postTestScore
8 Scouts Sloan 2 62
0 Nighthawks Miller 4 25
4 Dragoons Cooze 3 70,
1: regiment name preTestScore postTestScore
9 Scouts Piger 3 70
1 Nighthawks Jacobson 24 94
5 Dragoons Jacon 4 25,
2: regiment name preTestScore postTestScore
10 Scouts Riani 2 62
2 Nighthawks Ali 31 57
6 Dragoons Ryaner 24 94,
3: regiment name preTestScore postTestScore
11 Scouts Ali 3 70
3 Nighthawks Milner 2 62
7 Dragoons Sone 31 57}
list
答案 2 :(得分:1)
您可以使用带有groupby
的嵌套cumcount
执行此操作,例如这将对团的所有第一次出现,第二次出现的团等进行分组:
In []:
[g for _, g in df.groupby(df.groupby('regiment').cumcount())]
Out[]:
[ regiment name preTestScore postTestScore
0 Nighthawks Miller 4 25
4 Dragoons Cooze 3 70
8 Scouts Sloan 2 62,
regiment name preTestScore postTestScore
1 Nighthawks Jacobson 24 94
5 Dragoons Jacon 4 25
9 Scouts Piger 3 70,
regiment name preTestScore postTestScore
2 Nighthawks Ali 31 57
6 Dragoons Ryaner 24 94
10 Scouts Riani 2 62,
regiment name preTestScore postTestScore
3 Nighthawks Milner 2 62
7 Dragoons Sone 31 57
11 Scouts Ali 3 70]