使用组行创建新的数据框

时间:2017-08-25 04:03:55

标签: python pandas

我想提取组中每个数据帧的行并从中创建新的数据帧,以便新数据帧仅包含组的第一行,另一个新数据帧包含第二行,另一个用于第三行和等等。例如,我的数据框是:

raw_data = {'regiment': ['Nighthawks', 'Nighthawks', 'Nighthawks', 'Nighthawks', 'Dragoons', 'Dragoons', 'Dragoons', 'Dragoons', 'Scouts', 'Scouts', 'Scouts', 'Scouts'],
    'name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze', 'Jacon', 'Ryaner', 'Sone', 'Sloan', 'Piger', 'Riani', 'Ali'],
    'preTestScore': [4, 24, 31, 2, 3, 4, 24, 31, 2, 3, 2, 3],
    'postTestScore': [25, 94, 57, 62, 70, 25, 94, 57, 62, 70, 62, 70]}
df = pd.DataFrame(raw_data, columns = ['regiment', 'name', 'preTestScore', 'postTestScore'])
df

      regiment      name  preTestScore  postTestScore
0   Nighthawks    Miller             4             25
1   Nighthawks  Jacobson            24             94
2   Nighthawks       Ali            31             57
3   Nighthawks    Milner             2             62
4     Dragoons     Cooze             3             70
5     Dragoons     Jacon             4             25
6     Dragoons    Ryaner            24             94
7     Dragoons      Sone            31             57
8       Scouts     Sloan             2             62
9       Scouts     Piger             3             70
10      Scouts     Riani             2             62
11      Scouts       Ali             3             70

我把它归为:

gb = df.groupby("regiment")

   regiment   name  preTestScore  postTestScore
8    Scouts  Sloan             2             62
9    Scouts  Piger             3             70
10   Scouts  Riani             2             62
11   Scouts    Ali             3             70
------------------
     regiment      name  preTestScore  postTestScore
0  Nighthawks    Miller             4             25
1  Nighthawks  Jacobson            24             94
2  Nighthawks       Ali            31             57
3  Nighthawks    Milner             2             62
------------------
   regiment    name  preTestScore  postTestScore
4  Dragoons   Cooze             3             70
5  Dragoons   Jacon             4             25
6  Dragoons  Ryaner            24             94
7  Dragoons    Sone            31             57
------------------

我想创建数据框,例如:

第一行的数据框:

    regiment        name         preTestScore  postTestScore
8    Scouts        Sloan              2             62
0    Nighthawks    Miller             4             25
4    Dragoons      Cooze              3             70

第二行的数据框:

   regiment          name        preTestScore  postTestScore
9    Scouts         Piger             3             70
1    Nighthawks    Jacobson           24            94
5    Dragoons       Jacon             4             25

等等。

我在考虑使用Group.apply(),但我不太确定。

非常感谢!

3 个答案:

答案 0 :(得分:1)

字典当然是无序的。鉴于样本数据每个团只有四行,这里是前四名的排名,它使用... here you match for n and m ... WITH n.valUpper as x1, n.valLower as y1, m.valUpper as x2, m.valLower as y2 RETURN apoc.coll.max([x2, y2]) - apoc.coll.min([x1, y1]) < (x2 - x1) + (y2 - y1) 上的nth。结果是使用字典理解迭代通过范围四(0,1,2,3),获取此类值的groupby行,并将值转换回其序号(例如0等于'第一') )。

nth

对于不同长度的行:

d = {n: ordinal for n, ordinal in zip(
             range(5), ['first', 'second', 'third', 'fourth', 'fifth'])}

top_n = 4
>>> {d[n]: df.groupby(['regiment']).nth(n) for n in range(top_n)}
{'first':               name  postTestScore  preTestScore
 regiment                                       
 Dragoons     Cooze             70             3
 Nighthawks  Miller             25             4
 Scouts       Sloan             62             2,
 'fourth':               name  postTestScore  preTestScore
 regiment                                       
 Dragoons      Sone             57            31
 Nighthawks  Milner             62             2
 Scouts         Ali             70             3,
 'second':                 name  postTestScore  preTestScore
 regiment                                         
 Dragoons       Jacon             25             4
 Nighthawks  Jacobson             94            24
 Scouts         Piger             70             3,
 'third':               name  postTestScore  preTestScore
 regiment                                       
 Dragoons    Ryaner             94            24
 Nighthawks     Ali             57            31
 Scouts       Riani             62             2}

答案 1 :(得分:1)

关于自定义索引的

df = df.iloc[1:-1, :] # Drop first and last row. >>> {d[n]: df.groupby(['regiment']).nth(n).reindex(sorted(df.regiment.unique())) for n in range(top_n)} {'first': name postTestScore preTestScore regiment Dragoons Cooze 70 3 Nighthawks Jacobson 94 24 Scouts Sloan 62 2, 'fourth': name postTestScore preTestScore regiment Dragoons Sone 57 31 Nighthawks NaN NaN NaN Scouts NaN NaN NaN, 'second': name postTestScore preTestScore regiment Dragoons Jacon 25 4 Nighthawks Ali 57 31 Scouts Piger 70 3, 'third': name postTestScore preTestScore regiment Dragoons Ryaner 94 24 Nighthawks Milner 62 2 Scouts Riani 62 2} ,请使用groupby来存储

dicts

In [67]: {x:g for x,g in df.sort_values(by='regiment',ascending=False).groupby(df.index%4)} Out[67]: {0: regiment name preTestScore postTestScore 8 Scouts Sloan 2 62 0 Nighthawks Miller 4 25 4 Dragoons Cooze 3 70, 1: regiment name preTestScore postTestScore 9 Scouts Piger 3 70 1 Nighthawks Jacobson 24 94 5 Dragoons Jacon 4 25, 2: regiment name preTestScore postTestScore 10 Scouts Riani 2 62 2 Nighthawks Ali 31 57 6 Dragoons Ryaner 24 94, 3: regiment name preTestScore postTestScore 11 Scouts Ali 3 70 3 Nighthawks Milner 2 62 7 Dragoons Sone 31 57}

list

答案 2 :(得分:1)

您可以使用带有groupby的嵌套cumcount执行此操作,例如这将对团的所有第一次出现,第二次出现的团等进行分组:

In []:
[g for _, g in df.groupby(df.groupby('regiment').cumcount())]

Out[]:
[     regiment    name  preTestScore  postTestScore
 0  Nighthawks  Miller             4             25
 4    Dragoons   Cooze             3             70
 8      Scouts   Sloan             2             62,
      regiment      name  preTestScore  postTestScore
 1  Nighthawks  Jacobson            24             94
 5    Dragoons     Jacon             4             25
 9      Scouts     Piger             3             70,
       regiment    name  preTestScore  postTestScore
 2   Nighthawks     Ali            31             57
 6     Dragoons  Ryaner            24             94
 10      Scouts   Riani             2             62,
       regiment    name  preTestScore  postTestScore
 3   Nighthawks  Milner             2             62
 7     Dragoons    Sone            31             57
 11      Scouts     Ali             3             70]