如何在pandas中按行拆分数据框或重新排序数据框

时间:2018-03-07 08:55:54

标签: python-2.7 pandas dataframe split

我只想清理数据框并分析数据框。但是,我遇到了麻烦。我创建了一个简单的数据框来说明它:

import pandas as pd
d = {'Resutls': ['IIL', 'pass','pass','IIH','pass','IIL','pass'], 'part':['None',1,2,'None',5,'None',4] }
df = pd.DataFrame(d)

结果如下:

    Resutls  part
0     IIL    None
1    pass      1
2    pass      2
3     IIH    None
4    pass      5
5     IIL    None
6    pass      4

数据框中有一些可重复的模块。我只是想按行重新排序数据帧并删除重复的数据框,如:

    Resutls  part
0     IIL    None
1    pass      1
2    pass      2
6    pass      4 
3     IIH    None
4    pass      5

或者只是将数据框分成几个子数据帧:

    Resutls  part
0     IIL    None
1    pass      1
2    pass      2
3    pass      4 

    Resutls  part
0     IIH    None
1    pass      5

这只是我想要做的一个简单的例子。实际上我有一个4000行的数据帧,我试图使用reindex或df.iloc来做到这一点。这很直观  对我而言似乎有点复杂。有什么好办法吗?请指教。

1 个答案:

答案 0 :(得分:1)

我认为您需要将pass替换为NaN并使用前向填充,然后按argsort排序并按iloc重新排序:

df = df.iloc[df['Resutls'].mask(df['Resutls'].eq('pass')).ffill().argsort()]
print (df)
  Resutls  part
3     IIH  None
4    pass     5
0     IIL  None
1    pass     1
2    pass     2
5     IIL  None
6    pass     4

最后按boolean indexing删除重复行:

df = df[~df['Resutls'].duplicated() | (df['Resutls'] == 'pass')]
print (df)
  Resutls  part
3     IIH  None
4    pass     5
0     IIL  None
1    pass     1
2    pass     2
6    pass     4

如果分别想要每个DataFrame:

df['g'] = df['Resutls'].mask(df['Resutls'].eq('pass')).ffill()
df = df[~df['Resutls'].duplicated() | (df['Resutls'] == 'pass')]
print (df)
  Resutls  part    g
0     IIL  None  IIL
1    pass     1  IIL
2    pass     2  IIL
3     IIH  None  IIH
4    pass     5  IIH
6    pass     4  IIL

dfs = {k:v.drop('g', axis=1) for k, v in df.groupby('g')}
#print (dfs)

print (dfs['IIH'])
  Resutls  part
3     IIH  None
4    pass     5

print (dfs['IIL'])
  Resutls  part
0     IIL  None
1    pass     1
2    pass     2
6    pass     4