Question

我有一个跟踪产品使用情况的数据集。捕获的时间范围内使用的某些功能极其不现实。我想选择符合特定过滤条件的数据。

status                                            1
crm_customer_guid          XXXXXXXXXXXXXXXXXXXXXXXX
product_name                                   XXXX
event_source                                  PROMO
offer_type                                    TRIAL
date_cohort                                  9/9/18
market_area                                      US
webservices_users                                 1
mobile_users                                      1
fiscal_yr_and_per_desc                      2018-12
fiscal_yr_and_qtr_desc                      2018-Q4
fiscal_yr_and_wk_desc                       2018-48
total_sessions                                 1107
Feature1                                       539
Feature2                                       864
Feature3                                       198
Feature4                                       0
Feature5                                       277
Feature6                                       1458
Feature7                                       899
Feature8                                       321
Feature9                                       716
Feature10                                      282
Feature11                                      1396

我想过滤所有功能编号<20，并将这些行插入新的数据框中。

我尝试使用

df_engaged = df[(((df['total_sessions'] > 2) & (df['total_sessions'] < 10)) & ((df['feature3'] < 11) & (df['feature4'] < 11)))]

要添加所有功能，似乎这种方法效率低下。

任何建议都很棒。提前致谢。

Answer 1

我认为您可以对数据框的“功能”部分进行切片和过滤

#Change the Feature# here
for i in range(5,11):
    df[df['Feature'+str(i)] < 20] = None

    new_df = df.dropna()

在熊猫中基于多种条件选择数据

1 个答案: