假设我具有以下数据框:
+---+---------+------+------+------+
| | summary | col1 | col2 | col3 |
+---+---------+------+------+------+
| 0 | count | 10 | 10 | 10 |
+---+---------+------+------+------+
| 1 | mean | 4 | 5 | 5 |
+---+---------+------+------+------+
| 2 | stddev | 3 | 3 | 3 |
+---+---------+------+------+------+
| 3 | min | 0 | -1 | 5 |
+---+---------+------+------+------+
| 4 | max | 100 | 56 | 47 |
+---+---------+------+------+------+
如何仅保留count > 5
,mean>4
和min>0
所在的列以及summary
列?
所需的输出是:
+---+---------+------+
| | summary | col3 |
+---+---------+------+
| 0 | count | 10 |
+---+---------+------+
| 1 | mean | 5 |
+---+---------+------+
| 2 | stddev | 3 |
+---+---------+------+
| 3 | min | 5 |
+---+---------+------+
| 4 | max | 47 |
+---+---------+------+
答案 0 :(得分:3)
您需要:
df2 = df.set_index('summary').T
m1 = df2['count'] > 5
m2 = df2['mean'] > 4
m3 = df2['min'] > 0
df2.loc[m1 & m2 & m3].T.reset_index()
输出:
summary col3
0 count 10
1 mean 5
2 stddev 3
3 min 5
4 max 47
注意:您可以直接在.loc[]
中轻松使用条件,但是当我们有多个条件时,最好使用单独的掩码变量(m1
,m2
,{{1 }})
答案 1 :(得分:2)
loc
可调用。
(df.set_index('summary').T
.loc[lambda x: (x['count'] > 5) & (x['mean'] > 4) & (x['min'] > 0)]
.T.reset_index())
答案 2 :(得分:1)
这是一种方法
s=df.set_index('summary')
com=pd.Series([5,4,0],index=['count','mean','min'])
idx=s.loc[com.index].gt(com,axis=0).all().loc[lambda x : x].index
s[idx]
Out[142]:
col3
summary
count 10
mean 5
stddev 3
min 5
max 47
答案 3 :(得分:1)
query
附近rash不休(
df.set_index('summary')
.rename(str.title).T
.query('Count > 5 & Mean > 4 and Min > 0')
.T.rename(str.lower)
.reset_index()
)
summary col3
0 count 10
1 mean 5
2 stddev 3
3 min 5
4 max 47
(
df[['summary']].join(
df.iloc[:, 1:].loc[:, df.iloc[[0, 1, 3], 1:].T.gt([5, 4, 0]).all(1)]
)
)
summary col3
0 count 10
1 mean 5
2 stddev 3
3 min 5
4 max 47
答案 4 :(得分:0)
将summary
列设置为索引,然后执行以下操作:
df.T.query("(count > 5) & (mean > 4) & (min > 0)").T