Question

text = "a/NNP b/NNG c/NP d/NNP e/PNG"

我想只带'NNP'和'NNG'标签的单词。

所以我尝试了：

words = re.compile('(\w+/[(NNP)|(NNG)]*)')
t = re.findall(words,text)

但是，结果不断向我显示

['a/NNP', 'b/NNG', 'c/NP', 'd/NNP','e/PNG'].
How can I get only ['a/NNP','b/NNG','d/NNP']?

Answer 1

您可以使用

long tomorrow13Unix = java.time.LocalDateTime.now().plusDays(1).with(java.time.LocalTime.of(13, 0)).toEpochSecond(java.time.ZoneOffset.UTC);

请参见Python demo。

正则表达式为import re text = "a/NNP b/NNG c/NP d/NNP e/PNG" words = re.compile(r'\w+/(?:NNP|NNG)\b') # OR words = re.compile(r'\w+/NN[PG]\b') print(re.findall(words,text)) # => ['a/NNP', 'b/NNG', 'd/NNP']，请参见this demo。匹配

\w+/NN[PG]\b-1个以上的字符字符（注意：仅匹配 letters ，将\w+替换为\w+）
[^\W\d_]+-/NN子字符串
/NN-与(?:NNP|NNG)或NNP
NNG-[PG]或P
G-单词边界（为了不匹配\b或其他任何字词。）

Answer 2

[]表示字符类。它不像数学中那样用于将所有内容组合在一起。

您可以使用非捕获组(?:)代替[]：

\w+/(?:NNP|NNG)\b

如果您的字符串总是以三个字符组成的三元组出现，那么就不需要\b。

您可以根据需要添加任意多个选项：

\w+/(?:NNP|NNG|ABC|DEF|GHI)\b

Answer 3

我不会说你需要正则表达式吗？

stuff = ('NNP', 'NNG')
text = "a/NNP b/NNG c/NP d/NNP e/PNG"
result = [i for i in text.split() if i.split("/")[1] in stuff]
# ['a/NNP', 'b/NNG', 'd/NNP']

以上内容比正则表达式更有效，并且更易于维护：

>>> import re
>>>
>>> text = "a/NNP b/NNG c/NP d/NNP e/PNG"
>>> stuff = ('NNP', 'NNG', 'VV', 'VA', 'MAG', 'MAJ', 'IC', 'VX', 'MM')
>>>
>>> def regex(reg):
...     words = re.compile(reg)
...     return re.findall(words,text)
...
>>> def notregex():
...     return [i for i in text.split() if i.split("/")[1] in stuff]
...
>>> from timeit import timeit
>>> timeit(stmt="regex(a)", setup="from __main__ import regex; a=r'\w+/(?:NNP|NNG|VV|VA|MAG|MAJ|IC|VX|MM)\b'", number=100000)
0.3145495569999639
>>> timeit(stmt="notregex()", setup="from __main__ import notregex", number=100000)
0.21294589500007532

正则表达式：我在使用'|'时遇到问题

3 个答案: