Question

我正在尝试在冻结集中找到一个子字符串，但是我有点没办法了。

我的数据结构是pandas.dataframe（如果您熟悉association_rules包，则是mlxtend包中的数据），我想打印出所有行的前行（即冻结集）包含特定字符串。

样本数据：

    print(rules[rules["antecedents"].str.contains('line', regex=False)])

但是无论何时我运行它，我都会得到一个空的数据框。

当我尝试仅在我的rules["antecedents"]系列上运行内部函数时，对于所有条目我只会得到False值。但是为什么呢？

Answer 1

因为dataframe.str.*函数仅用于字符串数据。由于您的数据不是字符串，因此无论字符串表示形式如何，它始终为NaN。证明：

>>> x = pd.DataFrame(np.random.randn(2, 5)).astype("object")
>>> x
         0         1         2          3          4
0 -1.17191  -1.92926 -0.831576 -0.0814279   0.099612
1 -1.55183 -0.494855   1.14398   -1.72675 -0.0390948
>>> x[0].str.contains("-1")
0   NaN
1   NaN
Name: 0, dtype: float64

你能做什么：

使用apply：

>>> x[0].apply(lambda x: "-1" in str(x))
0    True
1    True
Name: 0, dtype: bool

因此您的代码应写为：

print(rules[rules["antecedents"].apply(lambda x: 'line' in str(x))])

如果您表示元素完全匹配，则可能要使用'line' in x

在熊猫FrozenSet中查找子串

1 个答案: