我有一个像这样的DataFrame:
+----------------------------------------------------------------------------------+
| Total_Production Utilization_rate Avg_Count |
+----------------------------------------------------------------------------------+
| 0 6.503907 96.027778 26.194017 |
| 9 6.171308 95.638889 31.500943 |
| 18 6.144897 95.986111 27.494776 |
| 27 6.056882 95.916667 27.525495 |
| 36 6.107343 105.541667 21.500208 |
| 45 2.139576 96.166667 27.480307 |
| 54 6.161222 96.486111 27.498256 |
| 63 1.034555 56.388889 27.568885 |
| 72 5.021524 91.069444 30.931702 |
| 81 5.831919 96.277778 28.284872 |
| 90 2.689860 62.486111 18.691440 |
| 99 5.227672 95.555556 31.441761 |
| 108 1.465271 95.541667 30.064098 |
+----------------------------------------------------------------------------------+
范围分为两个系列。 最高范围: 总产量7.744379 利用率104.534796 平均数29.691733
最低范围: 总产量3.880623 利用率64.315015 平均数22.652148
过滤掉列数据的最佳方法是什么? 我可以通过迭代行使用for循环吗?
答案 0 :(得分:7)
您可以使用&
operator来限制各列的范围:
df[
(3.880623 < df['Total_Production']) & (df['Total_Production'] < 7.744379) &
(64.315015 < df['Utilization_rate']) & (df['Utilization_rate'] < 104.534796) &
(22.652148 < df['Avg_Count']) & (df['Avg_Count'] < 29.691733)
]
答案 1 :(得分:3)
您可以使用query
In [233]: df.query('3.880623 < Total_Production < 7.744379 and 64.315015 < Utiliza
...: tion_rate < 104.534796 and 22.652148 < Avg_Count < 29.691733')
Out[233]:
Total_Production Utilization_rate Avg_Count
0 6.503907 96.027778 26.194017
18 6.144897 95.986111 27.494776
27 6.056882 95.916667 27.525495
54 6.161222 96.486111 27.498256
81 5.831919 96.277778 28.284872
答案 2 :(得分:1)
def foo():
df[
(3.880623 < df['Total_Production']) & (df['Total_Production'] < 7.744379) &
(64.315015 < df['Utilization_rate']) & (df['Utilization_rate'] < 104.534796) &
(22.652148 < df['Avg_Count']) & (df['Avg_Count'] < 29.691733) ]
def foo1():
df[df.Total_Production.between(left=3.880623, right=7.744379) &
df.Utilization_rate.between(left=64.315015, right=104.534796) &
df.Avg_Count.between(left=22.652148, right=29.691733)]
def foo2():
df.query("3.880623 < Total_Production < 7.744379 and 64.315015 < Utilization_rate < 104.534796\
and 22.652148 < Avg_Count < 29.691733")
%timeit foo()
%timeit foo1()
%timeit foo2()
输出:
100 loops, best of 3: 2.95 ms per loop
100 loops, best of 3: 2.92 ms per loop
100 loops, best of 3: 3.67 ms per loop