让下面的Pandas Dataframe df
,如何找到值为6和10的行?
0 1 2 3 4 5 6
0 11 1 3 4 6 8 10
1 11 1 3 4 6 8 11
2 11 1 3 4 6 8 0
3 11 1 3 4 6 9 10
4 11 1 3 4 6 9 11
5 11 1 3 4 6 9 0
6 11 1 3 4 6 10 10
7 11 1 3 4 6 10 11
8 11 1 3 4 6 10 0
9 11 1 3 4 7 8 10
我可以使用基于集合的解决方案来获取这些行:
>>> df.iloc[[i for i, s in enumerate(df.itertuples()) if {6, 10} <= set(s)]]
0 1 2 3 4 5 6
0 11 1 3 4 6 8 10
3 11 1 3 4 6 9 10
6 11 1 3 4 6 10 10
7 11 1 3 4 6 10 11
8 11 1 3 4 6 10 0
我的问题是:在这些给定值存在的行中,Pandas是否有更好的方法来获得真实?例如:
df.where({6, 10} <= df)
数据示例:
pandas.DataFrame.from_dict({0: {0: 11, 1: 11, 2: 11, 3: 11, 4: 11, 5: 11, 6: 11, 7: 11, 8: 11, 9: 11},
1: {0: 1, 1: 1, 2: 1, 3: 1, 4: 1, 5: 1, 6: 1, 7: 1, 8: 1, 9: 1},
2: {0: 3, 1: 3, 2: 3, 3: 3, 4: 3, 5: 3, 6: 3, 7: 3, 8: 3, 9: 3},
3: {0: 4, 1: 4, 2: 4, 3: 4, 4: 4, 5: 4, 6: 4, 7: 4, 8: 4, 9: 4},
4: {0: 6, 1: 6, 2: 6, 3: 6, 4: 6, 5: 6, 6: 6, 7: 6, 8: 6, 9: 7},
5: {0: 8, 1: 8, 2: 8, 3: 9, 4: 9, 5: 9, 6: 10, 7: 10, 8: 10, 9: 8},
6: {0: 10, 1: 11, 2: 0, 3: 10, 4: 11, 5: 0, 6: 10, 7: 11, 8: 0, 9: 10}})
这个数据帧只是我真实数据的一小部分。 0到11之间的整数可以在每行中出现0到2次。例如,在这些行中,值4,8和11各出现两次。
0 1 2 3 4 5 6
100 11 1 4 4 8 8 11
343 11 2 4 4 8 8 11
505 11 3 3 4 8 8 11
586 11 3 4 4 8 8 11
1558 1 1 4 4 8 8 11
答案 0 :(得分:1)
您可以使用isin
来测试成员身份,然后调用dropna
并传递thresh=2
以仅显示至少存在2个非NaN值的行:
In [20]:
df[df.isin([6,10])].dropna(thresh=2)
Out[20]:
0 1 2 3 4 5 6
0 NaN NaN NaN NaN 6 NaN 10
3 NaN NaN NaN NaN 6 NaN 10
6 NaN NaN NaN NaN 6 10 10
7 NaN NaN NaN NaN 6 10 NaN
8 NaN NaN NaN NaN 6 10 NaN
我认为实际上测试每个值并应用any
会更好:
In [41]:
df.apply(lambda x: (x == 6).any() & (x == 10).any(), axis=1)
Out[41]:
0 True
1 False
2 False
3 True
4 False
5 False
6 True
7 True
8 True
9 False
dtype: bool
您可以执行3个值:
df.apply(lambda x: (x==5).any() & (x == 6).any() & (x == 10).any(), axis=1)