Question

所以我已经看过堆栈溢出，但是似乎找不到解决我问题的答案。如何在
标记后获取文本（特定文本）？

这是我的代码：

product_review_container = container.findAll("span",{"class":"search_review_summary"})
for product_review in product_review_container:
    prr = product_review.get('data-tooltip-html')
    print(prr)

这是输出：

Very Positive<br>86% of the 1,013 user reviews for this game are positive.

我只想在该字符串中输入86％，并且也仅希望输入1,013。所以只有数字。但是它不是一个整数，所以我不知道该怎么办。

这是文本的来源：

   [<span class="search_review_summary positive" data-tooltip-html="Very Positive&lt;br&gt;86% of the 1,013 user reviews for this game are positive.">
</span>]

以下是我从中获取信息的链接：https://store.steampowered.com/search/?specials=1&page=1

谢谢！

Answer 1

您需要在此处使用正则表达式！

import re

string = 'Very Positive<br>86% of the 1,013 user reviews for this game are positive.'
a = re.findall('(\d+%)|(\d+,\d+)',string)
print(a)

output: [('86%', ''), ('', '1,013')]
#Then a[0][0] will be 86% and a[1][1] will be 1,013

其中\ d是字符串中的任何数字字符，而+至少包含1个或多个数字。

如果您需要更具体的正则表达式，则可以在https://regex101.com

中进行尝试

Answer 2

有一种非正则表达式的方法；公认有些令人费解，但仍然很有趣：

首先，我们借用（并修改）this nice function:

def split_and_keep(s, sep):
         if not s: return [''] # consistent with string.split()
         p=chr(ord(max(s))+1)
         return s.replace(sep, sep+p).split(p)

然后我们执行一些标准步骤：

html = """
  [<span class="search_review_summary positive" data-tooltip-html="Very    Positive&lt;br&gt;86% of the 1,013 user reviews for this game are positive."></span>]
  """

from bs4 import BeautifulSoup as bs4
soup = bs4(html, 'html.parser')
info = soup.select('span')[0].get("data-tooltip-html")
print(info)

到目前为止的输出是：

Very Positive<br>86% of the 1,013 user reviews for this game are positive.

接下来我们要去

data = ''.join(c for c in info if (c.isdigit()) or c == '%')
print(data)

现在输出更好了：

86%1013

快到了；现在是piècederésistance：

split_and_keep(data, '%')

最终输出：

['86%', '1013']

Python，漂亮的汤，<br/>标签

2 个答案: