Question

我有一个要解析的csv。其中一个步骤需要根据另一行的值更改特定行的值。

我知道的唯一方法（我是python的新手）是做熊猫过滤器的，效果很好。

我似乎找不到答案的问题是，然后如何取消过滤它，以便可以进行其他过滤？

这是我现在的工作代码

我曾经尝试过爬行熊猫参考指南，但我似乎找不到答案。

import pandas as pd
from prompt_toolkit import prompt

filename = input("Enter the path of excel file = ")
abc = pd.read_csv(filename, header=1, dtype=str)

abc = abc[(abc['column_title_A'].str.startswith("300")) | (abc['column_title_A'].str.startswith("860"))]

# change value based on another value in another
abc.loc[abc['column_title_B'] == '29JUL2019', 'column_title_C'] = '15/02/2019'
abc.loc[abc['column_title_B'] == '25FEB2019', 'column_title_C'] = '19/05/2019'

# from here on, how do I unfilter the above to apply another filter below?
abc = abc[(abc['column_title_B'].str.startswith("300")) | (abc['column_title_B'].str.startswith("860"))]

我要过滤A集，然后取消过滤以进行其他过滤

Answer 1

可以使用掩码而不是替换abc：

mask = (abc['column_title_A'].str.startswith("300")) | (abc['column_title_A'].str.startswith("860"))

# change value based on another value in another
abc.loc[mask & (abc['column_title_B'] == '29JUL2019'), 'column_title_C'] = '15/02/2019'
abc.loc[mask & (abc['column_title_B'] == '25FEB2019'), 'column_title_C'] = '19/05/2019'

mask = abc[(abc['column_title_B'].str.startswith("300")) | (abc['column_title_B'].str.startswith("860"))]
...

Answer 2

而不是“ unfilter”，您不应首先进行过滤和覆盖。

我建议这样：

feature_importances = model.stages[-2].featureImportances
feature_imp_array = feature_importances.toArray()

feat_imp_list = []
for feature, importance in zip(tf_model.vocabulary, feature_imp_array):
    feat_imp_list.append((feature, importance))

feat_imp_list = sorted(feat_imp_list, key=(lambda x: x[1]), reverse=True)

top_features = feat_imp_list[0:50]

如何在Pandas Python中过滤和取消过滤？

2 个答案: