Question

我有一个代码：

import urllib.request 
from bs4 import *
soup = BeautifulSoup(urllib.request.urlopen("http://biznes.pl/waluty/profile/fixing-nbp/jen-japonia,821,0,30,profile-waluta-nbp.html"), "lxml")
price = soup.find_all("div", {"class":"cena"})
print(price)

结果是：

[<div class="cena">

                                                            3,6178

                        </div>, <div class="cena Waluta">PLN</div>, <div class="cena up"></div>]

我想将数字“3,6178”分开，以便在进一步的计算中使用它。我该怎么办？

Answer 1

在这种情况下，您可以执行以下操作：

p = price[0].string.strip()

获取字符串“3,6178”

但是，请注意，您还需要删除逗号来进行计算：

p = int(p.replace(",", ""))

编辑：如果（如下面的Padraic所指出的）3,6178实际上是代表值3.6178，你可以进行简单的修改：

p = float(p.replace(",", "."))

Answer 2

您需要考虑区域设置，36178不一定是3,6178，具体取决于区域设置，如果您需要单个元素，则还应使用find：

In [1]: import urllib.request

In [2]: import locale

In [3]: locale.setlocale(locale.LC_ALL, 'de_DE')
Out[3]: 'de_DE'

In [4]: from bs4 import BeautifulSoup

In [5]: soup = BeautifulSoup(urllib.request.urlopen("http://biznes.pl/waluty/profile/fixing-nbp/jen-japonia,821,0,30,profile-waluta-nbp.html"), "lxml")

In [6]: price = soup.find("div", {"class":"cena"})

In [7]: print(locale.atof(price.text.strip()))
3.6178

该网站为波兰语，因此将区域设置设为pl_PL，这意味着3,6178与上面的输出3.6178相同。

如何从网站上提取数字

2 个答案: