Question

我编写了以下程序，提取所有模式（带有可能的连字符，标点符号的单词）

sentence="Narrow-minded people are happy although it's cold ! I'm also happy" 
print(re.split('([^-\w])',sentence))

结果是：

['Narrow-minded', ' ', 'people', ' ', 'are', ' ', 'happy', ' ', 'although', ' ', 'it', "'", 's', ' ', 'cold', ' ', '', '!', '', ' ', 'I', "'", 'm', ' ', 'also', ' ', 'happy']

问题是如何在一个单词的末尾考虑（添加）撇号。例如：我们想要检索"it'"而不是夫妻"it", "'"。

Answer 1

您可以添加以撇号结尾的单词作为特例：

print(re.split('([\w-]+\'|[^-\w])',sentence))

在这种情况下，句子分为

一个或多个\w字符的序列，后跟撇号（[\w-]+\'部分
或任何不是破折号或\w字符的字符（[^-\w]部分）

这导致：

['Narrow-minded', ' ', 'people', ' ', 'are', ' ', 'happy', ' ', 'although', ' ', '', "it'", 's', ' ', 'cold', ' ', '', '!', '', ' ', '', "I'", 'm', ' ', 'also', ' ', 'happy']

请注意，这会增加列表中空字符串（''）的数量，以摆脱那些可以过滤列表的字符串：

print(filter(None, re.split('([\w-]+\'|[^-\w])',sentence)))

导致：

['Narrow-minded', ' ', 'people', ' ', 'are', ' ', 'happy', ' ', 'although', ' ', "it'", 's', ' ', 'cold', ' ', '!', ' ', "I'", 'm', ' ', 'also', ' ', 'happy']

用撇号提取单词作为最终可能的字母

1 个答案: