Python从字符串中的列表中搜索确切的单词?

时间:2016-09-20 08:02:20

标签: python regex list search find

我需要从字符串中的列表中找到确切的单词。

我试过下面的代码。在这里,我从列表中得到单个单词的完全匹配,但是如何匹配列表中的两个单词。

categories_to_retain = 
['SOLID',
 'GEOMETRIC',
 'FLORAL',
 'BOTANICAL',
 'STRIPES',
 'ABSTRACT',
 'ANIMAL',
 'GRAPHIC PRINT',
 'ORIENTAL',
 'DAMASK',
 'TEXT',
 'CHEVRON',
 'PLAID',
 'PAISLEY',
 'SPORTS']

x = " Beautiful Art By  Design Studio **graphic print** Creates A **TEXT** Design For This Art Driven Duvet. Printed In Remarkable Detail On A Woven Duvet, This Is An Instant Focal Point Of Any Bedroom. The Fabric Is Woven Of Easy Care Polyester And Backed With A Soft Poly/Cotton Blend Fabric. The Texture In The Fabric Gives Dimension And A Unique Look And Feel To The Duvet."

x = x.upper()

print x

#x = "GRAPHIC"
#x = "GRAPHIC PRINTS"


matches = [cat for cat in categories_to_retain if cat in x.split()]

matches

Output:
['TEXT']

在这里你可以看到我的列表中有一个名为'GRAPHIC PRINT'的单词。我想从我的字符串中找到这个词。

即使它以复数形式或过去时态存在,我也需要找到单词。例如,STRIPED,STRIPE,GRAPHIC PRINTS等。

谢谢, NIRANJAN

3 个答案:

答案 0 :(得分:1)

使用带有边界的正则表达式来获得完全匹配,即使您只有单个单词,如果您试图忽略任何标点符号,您的逻辑将无效:

import re

patts = re.compile("|".join(r"\b{}\b".format(s) for s in categories_to_retain), re.I)

x = " Beautiful Art By  Design Studio **graphic print** Creates A **TEXT** Design For This Art Driven Duvet. Printed In Remarkable Detail On A Woven Duvet, This Is An Instant Focal Point Of Any Bedroom. The Fabric Is Woven Of Easy Care Polyester And Backed With A Soft Poly/Cotton Blend Fabric. The Texture In The Fabric Gives Dimension And A Unique Look And Feel To The Duvet."

print(patts.findall(x))

哪会给你:

['graphic print', 'TEXT']

答案 1 :(得分:0)

您可以使用正则表达式,这也有助于避免匹配字符序列,并且将显示确切的输入字。

import re
matches = []
categories_to_retain = ['SOLID',
     'GEOMETRIC',
     'FLORAL',
     'BOTANICAL',
     'STRIPES',
     'ABSTRACT',
     'ANIMAL',
     'GRAPHIC PRINT',
     'ORIENTAL',
     'DAMASK',
     'TEXT',
     'CHEVRON',
     'PLAID',
     'PAISLEY',
     'SPORTS']

x = " Beautiful Art By  Design Studio **graphic print** Creates A **TEXT** Design For This Art Driven Duvet. Printed In Remarkable Detail On A Woven Duvet, This Is An Instant Focal Point Of Any Bedroom. The Fabric Is Woven Of Easy Care Polyester And Backed With A Soft Poly/Cotton Blend Fabric. The Texture In The Fabric Gives Dimension And A Unique Look And Feel To The Duvet."

x = x.upper()

print(x)

def searchWholeWord(w):
    return re.compile(r'\b({0})\b'.format(w), flags=re.IGNORECASE).search

for cat in categories_to_retain:
    return_value = searchWholeWord(cat)(x)
    if return_value:
        matches.append(cat)

print(matches)

输出:

['GRAPHIC PRINT', 'TEXT']

答案 2 :(得分:-1)

这里你使用默认的split()分割字符串,这意味着它将在每个空格中分割:x.split()中会有字符串“GRAPHIC”和“PRINT”,但不是“GRAPHIC PRINT” 。你可能想要使用“if cat in x”,我相信在这种情况下我会回复你需要的东西。

这应该有效:

matches = [cat for cat in categories_to_retain if cat in x]