Question

列表中有一系列从html中提取的元素 - 每个元素都带有中断标记（ ...）。我在下面的代码中使用了一个元素，并将应用于循环，但它会在单个元素上引发错误SyntaxError: unexpected EOF while parsing。

import re

firstElementText = '<td align="center" bgcolor="#e0e0e0" nowrap="" valign="middle"><b>Season</b></td>'

re.search(r'<br>.(.*?)</br>', firstElementText ).group(1)

希望从搜索中返回Season。

Answer 1

这是因为你的HTML：

xs

没有firstElementText = '<td align="center" bgcolor="#e0e0e0" nowrap="" valign="middle">Season</td>'。将其更改为

<br>

对我很好。而且，您的RegEx应该是这样的：

    firstElementText = '<td align="center" bgcolor="#e0e0e0" nowrap="" valign="middle"><br>Season</br></td>'

你看到＆＃34;缺失＆＃34; re.search(r' (.*?)', firstElementText ).group(1)和>之间的点？这将忽略组中的第一个字符。以下对我有用：

Python 3.4.2。

BTW那里没有>>> import re >>> firstElementText = '<td align="center" bgcolor="#e0e0e0" nowrap="" valign="middle"> Season</td>' >>> re.search(r' (.*?)', firstElementText ).group(1) 'Season' >>>。它应该是 ，因为它会打破一条线，并且不会以任何其他方式影响它...... 正如您可以在评论中看到的那样：https://stackoverflow.com/a/1732454/2588818

使用正则表达式在中断标记之间提取html-as-text

1 个答案: