Question

我需要使我的代码向后兼容python2.6和BeautifulSoup 3.我的代码是使用python2.7编写的，在这种情况下使用BS4。但是当我尝试在squeezy服务器上运行它时，我得到了这个错误（它有python2.6和bs3）：

try:
    from bs4 import BeautifulSoup
except ImportError:
    from BeautifulSoup import BeautifulSoup

gmp = open(fname, 'r')
soup = BeautifulSoup(gmp)
p = soup.body.div.find_all('p')

p = soup.body.div.find_all('p')
TypeError: 'NoneType' object is not callable

如果我改为：

   p = soup.body.div.findAll('p')

然后我收到了这个错误：

p = soup.body.div.findAll('p')
TypeError: 'NoneType' object is not callable

抛出错误的更新

  File "/home/user/openerp/7.0/addons/my_module/models/gec.py", line 401, in parse_html_data
    p = soup.body.div.findAll('p') #used findAll instead of find_all for backwards compatability to bs3 version
TypeError: 'NoneType' object is not callable

无论哪种方式，这两种方法都可以在我的Ubuntu上使用python2.7和bs4，但不能用于squeezy。那些我没有看到/知道并且给我这个错误的版本之间是否有任何其他差异？

Answer 1

您使用的是BeautifulSoup 3，但使用的是BeautifulSoup 4语法。

你的后备是错误的：

try:
    from bs4 import BeautifulSoup
except ImportError:
    from BeautifulSoup import BeautifulSoup

如果要使用版本3或4，请坚持使用版本3语法：

p = soup.body.div.findAll('p')

因为find_all不是BeautifulSoup 3中的有效方法，所以它被解释为标记搜索。您的HTML中没有find_all标记，因此会返回None，然后您尝试调用此标记。

接下来，BeautifulSoup 3使用的解析器将对已损坏或不完整的HTML做出不同的响应。如果您在Ubuntu上安装了lxml，那么它将被用作默认解析器，并且它会为您插入缺少的<body>标记。 BeautifulSoup 3可能会把它留下来。

我强烈建议您改为删除后备，并坚持使用BeautifulSoup第4版。版本3已在几年前停止使用，并包含未修复的错误。 BeautifulSoup 4还提供了您可能想要使用的其他功能。

BeautifulSoup是纯Python，可以在Python支持的任何平台上轻松安装到虚拟环境中。您在此处与系统提供的软件包相关联。

在Debian Squeezy上，你可能会遇到BeautifulSoup 3.1.0，甚至是BeautifulSoup developers do not want you to use it!。 findAll的问题几乎肯定源于使用该版本。

Answer 2

我知道这是一个 6 年前的帖子，但如果有人遇到类似问题，请发布此帖子。

看起来在第 9 行它应该是一个格式化的字符串，添加 f 后它似乎工作得很好。

import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup

product_all_pages = []

for i in range(1,15):
    response = requests.get(f"https://www.bol.com/nl/s/?page={i}&searchtext=hand+sanitizer&view=list")
    content = response.content
    parser = BeautifulSoup(content, 'html.parser')
    body = parser.body
    producten = body.find_all(class_="product-item--row js_item_root")
    product_all_pages.extend(producten)
len(product_all_pages)

price = float(product_all_pages[1].meta.get('content'))
productname = product_all_pages[1].find(class_="product-title--inline").a.getText()
print(price)
print(productname)

productlijst = []

for item in product_all_pages:
    if item.find(class_="product-prices").getText() == '\nNiet leverbaar\n':
        price = None
    else:
        price = float(item.meta['content'])
    product = item.find(class_="product-title--inline").a.getText()
    productlijst.append([product, price])
    
print(productlijst[:3])

df = pd.DataFrame(productlijst, columns=["Product", "price"])
print(df.shape)
df["price"].describe()

BeautifulSoup - TypeError：＆＃39; NoneType＆＃39;对象不可调用

2 个答案: