Python3-ValueError:没有足够的值可解包(预期3,得到2)

时间:2018-07-29 14:48:22

标签: python-3.x

我正在尝试进行一些实验性的网络爬网,并询问在以下情况下是否有可能克服ValueError。作为示例,我想对以下5个数据字段进行网络抓取:

1) Car Model: Honda Fit Auto 1.3
2) Price: S$19,000
3) Date post: 3 weeks ago by back_packer
4) Depreciation: S$8,362.75
5) Registration Date: 15 Jan 2010

在网站的html中,2)到5)的数据位于同一标签下

<p class="cU-b cU-d">3 weeks ago by <a href="/back_packer" rel="nofollow " 
target="_blank">back_packer</a></p>
<p class="cU-b cU-d">S$19,000</p>
<p class="cU-b cU-d">S$8,362.75</p>
<p class="cU-b cU-d">15 Jan 2010</p>

因此,我尝试运行以下Python代码。

def getHTML(link, counter):
    return bs(get(link.format(counter)).content, "html.parser")

PAGE_URL = 'https://sg.carousell.com/categories/cars-32/cars-for-sale-1173/'
CAR_URL = 'https://sg.carousell.com/p/{}'

car = dict()
content = getHTML(CAR_URL, car_id).find('div', {'class': 'aG-c aG-b'})
car['Model'] = content.find('p', {'class': 'cU-b cU-e'}).text

car['Post'], car['Price'], car['Deprec'], car['Regstr_Date'] = {info.text for 
info in content.find_all('p', {'class': 'cU-b cU-d'})}

====================================

当我尝试运行时,我会遇到“ ValueError:没有足够的值要解压(预期3,得到2)”。我怀疑该错误是由至少一项汽车记录引起的,该记录中缺少邮寄,价格,折旧或登记日期的字段。

谢谢。

1 个答案:

答案 0 :(得分:0)

该页面不是十分友好,因此您可以使用反复试验方法,直到获得正确的结果。我的尝试是在这里(我用"-"替换了缺失的值,至少它不会引发ValueError,但是您需要检查它是否刮取了正确的信息):

from bs4 import BeautifulSoup as bs
from requests import get
import re
from pprint import pprint


def getHTML(link, counter):
    return bs(get(link.format(counter)).content, "html.parser")

PAGE_URL = 'https://sg.carousell.com/categories/cars-32/cars-for-sale-1173/'
CAR_URL = 'https://sg.carousell.com/p/{}'

# car_id = 'mazda-3-sedan-auto-1-5-182030279'
car_id = 'nissan-nv200-1-5-manual-182141686'
# car_id = 'toyota-corolla-axio-1-5-auto-x-177344405'

car = {}
content = getHTML(CAR_URL, car_id).find('div', {'class': 'aG-c aG-b'})
car['Model'] = content.find('p', {'class': 'cU-b cU-e'}).text

data = []
for p in content.select('section.bi-c.bi-h p.cU-b.cU-d')[:4]:
    if re.match(r'\d+\s+Likes', p.text):
        break
    data.append(p.text)

car['Post'], car['Price'], car['Deprec'], car['Regstr_Date'], *_ = data + ['-'] * 4

# swap Deprec and Registration Date?
if car['Deprec'] != '-' and '$' not in car['Deprec']:
    car['Regstr_Date'], car['Deprec'] = car['Deprec'], car['Regstr_Date']

pprint(car)

这辆车的照片:

{'Deprec': '-',
 'Model': 'Nissan NV200 1.5 Manual',
 'Post': 'an hour ago by rubberr',
 'Price': 'S$25,800',
 'Regstr_Date': '-'}