Question

以下是用于网页抓取AAPL雅虎财务股票价格的Python 3的代码。

import urllib.request
from bs4 import BeautifulSoup as bs4

htmlfile = urllib.request.urlopen("http://finance.yahoo.com/q?s=AAPL")

htmltext = htmlfile.read()

for price in htmltext.find(attrs={'id':"yfs_184_aapl"}):
    print (price)

显然，代码在Python 2.7中几乎没有修改就可以正常工作。但是，它在Python 3.3.3 Shell中不起作用。这是它显示的错误：

Traceback (most recent call last):
  File "C:/Python33/python codes/webstock2.py", line 8, in <module>
    for price in htmltext.find(attrs={'id':"yfs_184_aapl"}):
TypeError: find() takes no keyword arguments

我已经学会了通过str.encode将字符串模式更正为二进制。我不确定这是否可以使用此代码。

Edit1：@Martijn后的最终工作代码更改

    import urllib.request
    from bs4 import BeautifulSoup as bs4

    htmlfile = urllib.request.urlopen("http://finance.yahoo.com/q?s=AAPL")

    htmltext = htmlfile.read()

    soup = bs4(htmltext)

    for price in soup.find_all(id="yfs_l84_aapl"):
        print (price)

打印出空白。你能搞清楚吗？再次感谢。

Answer 1

您正在呼叫str.find()，不是 BeautifulSoup.find()。你忘记了什么：

soup = bs4(htmltext)

for price in soup.find(attrs={'id':"yfs_184_aapl"}):

但是如果要循环播放，则需要调用find_all()，实际上是：

for price in soup.find_all(id="yfs_l84_aapl"):

您实际上不必使用attrs关键字参数;将属性指定为关键字参数也可以正常工作。

你做必须使用正确的id属性;它是yfs_l84_aapl（字母l，后跟数字8和4），而不是数字1。

python 3 web抓取代码为雅虎财务股票

1 个答案: