熊猫:从列表中选择包含任何子字符串的行

时间:2020-10-19 07:36:06

标签: python python-3.x pandas dataframe substring

我想在包含列表中任何子字符串的列中选择那些行。这就是我现在所拥有的。

product = ['LID', 'TABLEWARE', 'CUP', 'COVER', 'CONTAINER', 'PACKAGING']

df_plastic_prod = df_plastic[df_plastic['Goods Shipped'].str.contains(product)]

df_plastic_prod.info()

样品df_plastic

Name          Product
David        PLASTIC BOTTLE
Meghan       PLASTIC COVER
Melanie      PLASTIC CUP 
Aaron        PLASTIC BOWL
Venus        PLASTIC KNIFE
Abigail      PLASTIC CONTAINER
Sophia       PLASTIC LID

所需的df_plastic_prod

Name          Product
Meghan       PLASTIC COVER
Melanie      PLASTIC CUP 
Abigail      PLASTIC CONTAINER
Sophia       PLASTIC LID

提前谢谢!感谢您的协助!

2 个答案:

答案 0 :(得分:2)

对于子类匹配值,将|的所有列表值与正则表达式or结合在一起-因此获取值LIDTABLEWARE ...:

list中使用2个或更多单词的解决方案也能很好地工作。

pat = '|'.join(r"\b{}\b".format(x) for x in product)
df_plastic_prod = df_plastic[df_plastic['Product'].str.contains(pat)]
print (df_plastic_prod)
      Name            Product
1   Meghan      PLASTIC COVER
2  Melanie        PLASTIC CUP
5  Abigail  PLASTIC CONTAINER
6   Sophia        PLASTIC LID

答案 1 :(得分:0)

一种解决方案是使用正则表达式解析'Product'列,并测试提取的值是否在product列表中,然后在结果上过滤原始DataFrame。

在这种情况下,将使用非常简单的正则表达式模式((\w+)$),该模式与行尾的单个单词匹配。

示例代码:

df.iloc[df['Product'].str.extract('(\w+)$').isin(product).to_numpy(), :]

输出:

      Name            Product
1   Meghan      PLASTIC COVER
2  Melanie        PLASTIC CUP
5  Abigail  PLASTIC CONTAINER
6   Sophia        PLASTIC LID

设置:

product = ['LID', 'TABLEWARE', 'CUP', 
           'COVER', 'CONTAINER', 'PACKAGING']

data = {'Name': ['David', 'Meghan', 'Melanie', 
                 'Aaron', 'Venus', 'Abigail', 'Sophia'],
        'Product': ['PLASTIC BOTTLE', 'PLASTIC COVER', 'PLASTIC CUP', 
                    'PLASTIC BOWL', 'PLASTIC KNIFE', 'PLASTIC CONTAINER',
                    'PLASTIC LID']}
    
df = pd.DataFrame(data)