我需要从字符串中的列表中找到确切的单词。
我试过下面的代码。在这里,我从列表中得到单个单词的完全匹配,但是如何匹配列表中的两个单词。
categories_to_retain =
['SOLID',
'GEOMETRIC',
'FLORAL',
'BOTANICAL',
'STRIPES',
'ABSTRACT',
'ANIMAL',
'GRAPHIC PRINT',
'ORIENTAL',
'DAMASK',
'TEXT',
'CHEVRON',
'PLAID',
'PAISLEY',
'SPORTS']
x = " Beautiful Art By Design Studio **graphic print** Creates A **TEXT** Design For This Art Driven Duvet. Printed In Remarkable Detail On A Woven Duvet, This Is An Instant Focal Point Of Any Bedroom. The Fabric Is Woven Of Easy Care Polyester And Backed With A Soft Poly/Cotton Blend Fabric. The Texture In The Fabric Gives Dimension And A Unique Look And Feel To The Duvet."
x = x.upper()
print x
#x = "GRAPHIC"
#x = "GRAPHIC PRINTS"
matches = [cat for cat in categories_to_retain if cat in x.split()]
matches
Output:
['TEXT']
在这里你可以看到我的列表中有一个名为'GRAPHIC PRINT'的单词。我想从我的字符串中找到这个词。
即使它以复数形式或过去时态存在,我也需要找到单词。例如,STRIPED,STRIPE,GRAPHIC PRINTS等。
谢谢, NIRANJAN
答案 0 :(得分:1)
使用带有边界的正则表达式来获得完全匹配,即使您只有单个单词,如果您试图忽略任何标点符号,您的逻辑将无效:
import re
patts = re.compile("|".join(r"\b{}\b".format(s) for s in categories_to_retain), re.I)
x = " Beautiful Art By Design Studio **graphic print** Creates A **TEXT** Design For This Art Driven Duvet. Printed In Remarkable Detail On A Woven Duvet, This Is An Instant Focal Point Of Any Bedroom. The Fabric Is Woven Of Easy Care Polyester And Backed With A Soft Poly/Cotton Blend Fabric. The Texture In The Fabric Gives Dimension And A Unique Look And Feel To The Duvet."
print(patts.findall(x))
哪会给你:
['graphic print', 'TEXT']
答案 1 :(得分:0)
您可以使用正则表达式,这也有助于避免匹配字符序列,并且将显示确切的输入字。
import re
matches = []
categories_to_retain = ['SOLID',
'GEOMETRIC',
'FLORAL',
'BOTANICAL',
'STRIPES',
'ABSTRACT',
'ANIMAL',
'GRAPHIC PRINT',
'ORIENTAL',
'DAMASK',
'TEXT',
'CHEVRON',
'PLAID',
'PAISLEY',
'SPORTS']
x = " Beautiful Art By Design Studio **graphic print** Creates A **TEXT** Design For This Art Driven Duvet. Printed In Remarkable Detail On A Woven Duvet, This Is An Instant Focal Point Of Any Bedroom. The Fabric Is Woven Of Easy Care Polyester And Backed With A Soft Poly/Cotton Blend Fabric. The Texture In The Fabric Gives Dimension And A Unique Look And Feel To The Duvet."
x = x.upper()
print(x)
def searchWholeWord(w):
return re.compile(r'\b({0})\b'.format(w), flags=re.IGNORECASE).search
for cat in categories_to_retain:
return_value = searchWholeWord(cat)(x)
if return_value:
matches.append(cat)
print(matches)
输出:
['GRAPHIC PRINT', 'TEXT']
答案 2 :(得分:-1)
这里你使用默认的split()分割字符串,这意味着它将在每个空格中分割:x.split()中会有字符串“GRAPHIC”和“PRINT”,但不是“GRAPHIC PRINT” 。你可能想要使用“if cat in x”,我相信在这种情况下我会回复你需要的东西。
这应该有效:
matches = [cat for cat in categories_to_retain if cat in x]