Question

为每个人欢呼，

我在python中编写了一个Web爬虫。由于我在使用美丽的汤时遇到了一些问题，我正在深入研究正则表达式。

我的问题集如下：让我们说我的文字如下：

<title>Document</title>
<head>test and whatever</head>
<body style="font-family:Times New Roman;font-size:10pt;">test and not whatever.</body>
<body style="font-family:Times New Roman;font-size:10pt;">test and not whatever. 
This is a text after the dot. And this is a text after the dot but with test in it. 
And a text with test and not.</body>

我实际想要实现的是提取字符串＆＃34; test＆＃34;出现，但不是提取字符串＆＃34;不是＆＃34;也出现了。

句子可以以＆＃34;开始。＆＃34;或＆＃34;＆gt;＆＃34;并以＆＃34;结束。＆＃34;或＆＃34;＆lt;＆＃34;。

我已经想出如何使用＆＃34; test＆＃34;来提取句子。在里面。即使用正则表达式代码（Demo）：

[^\>|.]*test[^\<|.]*

不幸的句子＆＃34; not＆＃34;似乎也被提取出来。那么，我希望输出看起来像是：

test and whatever
And this is a text after the dot but with test in it

我试过了

[^\>|.]*(?=test)((?!not).)[^\<|.]*

遗憾的是无法正常工作。希望有人可以帮助我。

谢谢和最诚挚的问候，格里特

正则表达式：结合句子上的消极和积极表达

0 个答案: