Question

我有一些像这样的字符串：

The pizza is so hot
Today I bought an hot and tasty pizza

我需要在python中提取 pizza 和形容词 hot 之间的所有单词。我该怎么办？

这是输出的一个例子。

is so 
and tasty

请注意，属性（例如披萨）和形容词（例如热门）可能是一个多标记词。

这就是我的尝试：

  attribute = re.search(values[0], descrizione, re.IGNORECASE)
  value = re.search(names[0], descrizione, re.IGNORECASE)
    if (attribute):
        print (attribute.group())
        print (descrizione.find(attribute.group()))

    if (value):
        print (value.group())
        print (descrizione.find(value.group()))

Answer 1

另一种方法，您可以定义您的＆＃34;从/到＆＃34;你想要的模式。

>>> import regex
>>> rgx = regex.compile(r'(?si)(?|{0}(.*?){1}|{1}(.*?){0})'.format('pizza', 'hot'))
>>> s1 = 'The pizza is so hot'
>>> s2 = 'Today I bought an hot and tasty pizza'
>>> for s in [s1, s2]:
...     m = rgx.findall(s)
...     for x in m:
...         print x.strip()

is so
and tasty

Answer 2

我认为一个好的解决方案就是利用分裂，并且＆＃39; |＆＃39;正则表达式中的字符。

strs = []
strs.append('The pizza is so hot')
strs.append('Today I bought a hot and tasty pizza')
item = 'pizza'
adj = 'hot'
rets = []

for str_ in strs:
    ret = re.split(item + '|' + adj, str_, re.IGNORECASE)
    rets.append(ret[1].strip())

这是有效的，因为当我们单独考虑这两个字符串时，我们得到一个包含三个元素的列表。

ret = re.split(item + '|' + adj, strs[0], re.IGNORECASE)
print ret
['the ', ' is so ', '']

ret = re.split(item + '|' + adj, strs[1], re.IGNORECASE)
print ret
['Today I bought a ', ' and tasty ', '']

因为我们知道两个单词只能在字符串中出现一次，所以我们可以可靠地将ret [1]作为结果，因为字符串应该只被拆分两次：一次当我们找到其中一个单词时，再次当我们找到了另一个。 OR字符让我们在不知道提前单词顺序的情况下拆分列表。

Answer 3

x="""The pizza is so hot
Today I bought an hot and tasty pizza
wow pizza and another pizza"""
print [j for i,j in re.findall(r"(pizza|hot)\s*(.*?)\s*(?!\1)(?:hot|pizza)",x)]

使用re.findall尝试此操作。

获取python中两个特定单词之间的所有单词

3 个答案: