Question

我正在开发Alexa技能，因此，我有可用的标准Python（2.7）库。因此，我没有BeautifulSoup4可供使用。

我正在尝试在完整的HTML页面中识别下面的字符串，然后从下面提取字符串“104个空格”。 “104个空格”是一个变量，但其他代码保持不变： -

<p class="jp1"><strong>Car Parking</strong>104 spaces</p>

这可能与HTMLParser有关，或者可以使用正则表达式搜索来完成吗？我很欣赏正则表达式不是解析HTML代码的最佳方法，但是我使用 urllib2 将HTML代码作为字符串处理，事实上我希望提取一个特定字符串，该字符串遵循此内的特定字符串，在这种情况下就足够了。

我想到的一个选择是： -

s1 = "<strong>Car Parking</strong>104 spaces</p>"
s2 = "<strong>Car Parking</strong>"

print s1[s1.index(s2) + len(s2):]

这将返回从“104个空格”开始的所有文本。因此，我如何隔离这段特定的文本？

由于

Answer 1

我不使用python，但这里有一个正则表达式。在此之后你需要用python手动替换strong和p标签应该很容易。

var test = '<p class="jp1"><strong>Car Parking</strong>104 spaces</p>'

test = test.match(/[<][/]strong[>](.*)[<][/]p[>]/gmi)
//console.log would be here test:  [ '</strong>104 spaces</p>' ]
test = test.join('')
test = test.replace('</strong>','')
test = test.replace('</p>','')

console.log('test: ', test)

使用HTMLParser＆amp; amp;从html中提取字符串Python2.7

1 个答案: