Question

鉴于专利，您如何生成一个正则表达式，过滤掉描述中的元素列表？元素可以通过以下方式识别：

元素

'a'或'the'
元素之后的数字

例如，鉴于此段：

“图具体实施方式图1示出了根据本发明的实施例的用于可调节线缆保持装置的底座10。底座10可包括底座孔16，以允许绳索穿过底座10.底座孔16的形状取决于可调节绳索保持装置的预期用途。如果绳索的横截面是圆形的，则基座孔16也可以是圆形的。另一方面，当预期的绳索是带子，其横截面是圆角矩形时，基座孔16也可以是圆角矩形。“

我想用常规快递吐出来

['a base 10', 'The base 10', 'a base hole 16', 'the base 10', 'the base hole 16', 'the base hole 16', 'the base hole 16']

Answer 1

您可以使用re.findall()：

>>> re.findall(r'((?:a|the)(?:(?!(?:\ba\b|\bthe\b)).)*\d+)',s,re.I)
['a base 10', 'The base 10', 'a base hole 16', 'the base 10', 'the base hole 16', 'the base hole 16', 'the base hole 16']

以下正则表达式：

r'((?:a|the)((?!(?:\ba\b|\bthe\b)).)*\d+)

将匹配以a或the开头并以数字结尾的任何子字符串。但是在((?!(?:\ba\b|\bthe\b)).)*之间使用negative look ahead的子字符串将匹配除字之外的任何内容a和the。获取'the present invention. The base 10'之类的长匹配并使用re.I标志来忽略大小写！

正则表达式列出专利中的元素

1 个答案: