我只想清理数据框并分析数据框。但是,我遇到了麻烦。我创建了一个简单的数据框来说明它:
import pandas as pd
d = {'Resutls': ['IIL', 'pass','pass','IIH','pass','IIL','pass'], 'part':['None',1,2,'None',5,'None',4] }
df = pd.DataFrame(d)
结果如下:
Resutls part
0 IIL None
1 pass 1
2 pass 2
3 IIH None
4 pass 5
5 IIL None
6 pass 4
数据框中有一些可重复的模块。我只是想按行重新排序数据帧并删除重复的数据框,如:
Resutls part
0 IIL None
1 pass 1
2 pass 2
6 pass 4
3 IIH None
4 pass 5
或者只是将数据框分成几个子数据帧:
Resutls part
0 IIL None
1 pass 1
2 pass 2
3 pass 4
Resutls part
0 IIH None
1 pass 5
这只是我想要做的一个简单的例子。实际上我有一个4000行的数据帧,我试图使用reindex或df.iloc来做到这一点。这很直观 对我而言似乎有点复杂。有什么好办法吗?请指教。
答案 0 :(得分:1)
我认为您需要将pass
替换为NaN
并使用前向填充,然后按argsort
排序并按iloc
重新排序:
df = df.iloc[df['Resutls'].mask(df['Resutls'].eq('pass')).ffill().argsort()]
print (df)
Resutls part
3 IIH None
4 pass 5
0 IIL None
1 pass 1
2 pass 2
5 IIL None
6 pass 4
最后按boolean indexing
删除重复行:
df = df[~df['Resutls'].duplicated() | (df['Resutls'] == 'pass')]
print (df)
Resutls part
3 IIH None
4 pass 5
0 IIL None
1 pass 1
2 pass 2
6 pass 4
如果分别想要每个DataFrame:
df['g'] = df['Resutls'].mask(df['Resutls'].eq('pass')).ffill()
df = df[~df['Resutls'].duplicated() | (df['Resutls'] == 'pass')]
print (df)
Resutls part g
0 IIL None IIL
1 pass 1 IIL
2 pass 2 IIL
3 IIH None IIH
4 pass 5 IIH
6 pass 4 IIL
dfs = {k:v.drop('g', axis=1) for k, v in df.groupby('g')}
#print (dfs)
print (dfs['IIH'])
Resutls part
3 IIH None
4 pass 5
print (dfs['IIL'])
Resutls part
0 IIL None
1 pass 1
2 pass 2
6 pass 4