Question

我有以下数据框：-

我想过滤claim_status中有11个位置的地方

以及aa1的claim_ststaus_reason。

我正在尝试下面的代码，但是它只是给了我所有行

my_list = 'aa1'

df[df['claim_status_reason'].str.contains( "|".join(my_list), regex=True)].reset_index(drop=True)

预期输出：-

1.) where there is 11 in claim_ststus 
2.) where there is aa1 in the claim_status_reason

Answer 1

不要对系列中的列表使用字符串操作。您可以改用列表推导。您选择的数据结构是“反熊猫”，因为您应该尽量避免将列表放在首位。这些操作无法矢量化。

mask1 = np.array([11 in x for x in df['claim_staus']])
mask2 = np.array(['aa1' in x for x in df['claim_status_reason']])

df = df[mask1 & mask2]

Answer 2

您可以使用>0来获取所需的过滤器，例如：

apply