Question

我创建了一个数据框：

[in] testing_df =pd.DataFrame(test_array,columns=['transaction_id','product_id'])

# Split the product_id's for the testing data
testing_df.set_index(['transaction_id'],inplace=True)
testing_df['product_id'] = testing_df['product_id'].apply(lambda row: row.split(','))

[out]                 product_id
transaction_id                 
001                       [P01]
002                  [P01, P02]
003             [P01, P02, P09]
004                  [P01, P03]
005             [P01, P03, P05]
006             [P01, P03, P07]
007             [P01, P03, P08]
008                  [P01, P04]
009             [P01, P04, P05]
010             [P01, P04, P08]

我现在如何从结果中删除'P04'和'P08'？

我试过了：

# Remove P04 and P08 from consideration
testing_df['product_id'] = testing_df['product_id'].map(lambda x: x.strip('P04'))

testing_df['product_id'].replace(regex=True,inplace=True,to_replace=r'P04,',value=r'')

然而，这两种选择似乎都不起作用。

数据类型为：

[in] print(testing_df.dtypes)
[out] product_id    object
dtype: object

[in] print(testing_df['product_id'].dtypes)
[out] object

Answer 1

我会在之前分割：

数据：

In [269]: df Out[269]: product_id transaction_id 1 P01 2 P01,P02 3 P01,P02,P09 4 P01,P03 5 P01,P03,P05 6 P01,P03,P07 7 P01,P03,P08 8 P01,P04 9 P01,P04,P05 10 P01,P04,P08

解决方案：

In [271]: df['product_id'] = df['product_id'].str.replace(r'\,*?(?:P04|P08)\,*?', '') \ .str.split(',') In [272]: df Out[272]: product_id transaction_id 1 [P01] 2 [P01, P02] 3 [P01, P02, P09] 4 [P01, P03] 5 [P01, P03, P05] 6 [P01, P03, P07] 7 [P01, P03] 8 [P01] 9 [P01, P05] 10 [P01]

或者你可以改变：

testing_df['product_id'] = testing_df['product_id'].apply(lambda row: row.split(','))

使用：

testing_df['product_id'] = testing_df['product_id'].apply(lambda row: list(set(row.split(','))- set(['P04','P08'])))

演示：

In [280]: df.product_id.apply(lambda row: list(set(row.split(','))- set(['P04','P08']))) Out[280]: transaction_id 1 [P01] 2 [P01, P02] 3 [P09, P01, P02] 4 [P01, P03] 5 [P01, P03, P05] 6 [P07, P01, P03] 7 [P01, P03] 8 [P01] 9 [P01, P05] 10 [P01] Name: product_id, dtype: object

Answer 2

将要删除的所有元素存储在列表中。

remove_results = ['P04','P08']
for k in range(len(testing_df['product_id'])):
    for r in remove_results:
        if r in testing_df['product_id'][k]:
            testing_df['product_id][k].remove(r)

Answer 3

列表理解可能是最有效的：

exc = {'P04', 'P08'}
df['product_id'] = [[i for i in L if i not in exc] for L in df['product_id']]

请注意，效率低下的Python级循环是不可避免的。 apply + lambda，map + lambda或就地解决方案都涉及Python级循环。

如何从Pandas数据框中的列表中删除值？

3 个答案: