整个代码：https://pastebin.com/5U3irKZp

Question

我的脚本在第449家Yelp餐厅之后停止抓取。

整个代码：https://pastebin.com/5U3irKZp

for idx, item in enumerate(yelp_containers, 1):
    print("--- Restaurant number #", idx)
    restaurant_title = item.h3.get_text(strip=True)
    restaurant_title = re.sub(r'^[\d.\s]+', '', restaurant_title)
    restaurant_address = item.select_one('[class*="secondaryAttributes"]').get_text(separator='|', strip=True).split('|')[1]

我得到的错误是：

回溯（最近通话最近）：在第41行的“ / Users / kenny / MEGA / Python / yelp scraper.py”文件中 restaurant_address = item.select_one（'[class * =“ secondaryAttributes”]'）。get_text（separator ='|'，strip = True）.split（'|'）[1] IndexError：列表索引超出范围

Answer 1

问题是某些餐馆缺少地址，例如，这一地址：

您应该做的是先检查地址是否有足够的元素，然后再对其建立索引。更改以下代码行：

restaurant_address = item.select_one('[class*="secondaryAttributes"]').get_text(separator='|', strip=True).split('|')[1]

这些：

restaurant_address = item.select_one('[class*="secondaryAttributes"]').get_text(separator='|', strip=True).split('|')
restaurant_address = restaurant_address[1] if len(restaurant_address) > 1 else restaurant_address[0]

我为所有页面运行了解析器，并且它起作用了。

我的脚本并没有刮掉所有Yelps餐厅

整个代码：https://pastebin.com/5U3irKZp

1 个答案: