Question

我正在基于一个原始的熊猫数据框创建3个熊猫数据框。我已经计算出偏离标准的标准差。

#Mean
stats_over_29000_mean = stats_over_29000['count'].mean().astype(int)

152542

#STDS
stats_over_29000_count_between_std = stats_over_29000_std - stats_over_29000_mean

54313

stats_over_29000_first_std = stats_over_29000_mean + stats_over_29000_count_between_std

206855

stats_over_29000_second_std = stats_over_29000_first_std + stats_over_29000_count_between_std

261168

stats_over_29000_third_std = stats_over_29000_second_std + stats_over_29000_count_between_std

315481

这可以从df下2个stds中获取所有行

#Select all rows where count is less than 2 standard deviations 
stats_under_2_stds = stats_over_29000[stats_over_29000['count'] < stats_over_29000_second_std]

下一步，我要从df中选择所有行，其中> = 2 stds而少于3 stds

我尝试过：

stats_2_and_over_under_3_stds = stats_over_29000[stats_over_29000['count'] >= stats_over_29000_second_std < stats_over_29000_third_std]

和

stats_2_and_over_under_3_stds = stats_over_29000[stats_over_29000['count'] >= stats_over_29000_second_std && < stats_over_29000_third_std]

但是似乎都不起作用。

Answer 1

熊猫现在拥有return this.db.collection( 'courses', ref=>ref.where("seqNo", ">=", "5") .where("lessonCount", ">=", 5)，可以同时使用both comparisons at the same time。

在您的情况下：

Series.between(left, right, inclusive=True)

Answer 2

这是在两种条件下对df进行过滤的方式：

初始化df = pd.DataFrame([[1,2],[1,3],[1,5],[1,8]],columns=['A','B'])
操作：res = df[(df['B']<8) & (df['B']>2)]
结果：
```
   A  B
1  1  3
2  1  5
```

您的情况：

stats_2_and_over_under_3_stds = stats_over_29000[(stats_over_29000['count'] >= stats_over_29000_second_std) & (stats_over_29000['count'] < stats_over_29000_third_std)]

Answer 3

loc 函数允许您应用多个条件以非常简洁的语法过滤数据框。我正在输入“感兴趣的列”，因为我不知道存储值的列名。或者，如果感兴趣的列是索引，您可以在 loc 函数内直接将条件写为 (stats_over_29000 > 261168)。

    stats_over_29000.loc[(stats_over_29000('column of interest') > 261168) &\
 (stats_over_29000('column of interest') < 315481)]

大熊猫将大于或小于行的多个行按特定列进行分组

3 个答案: