Question

我是新的网页抓取工具，并试图废弃餐厅详细信息表格中的所有内容，以便我可以继续进行刮擦。

public class IgnoreDataTypeConverter : JsonConverter
{
    public override bool CanConvert(Type objectType)
    {
        return true;
    }

    public override object ReadJson(JsonReader reader, Type objectType, object existingValue, JsonSerializer serializer)
    {
        try { return JToken.Load(reader).ToObject(objectType); }
        catch { }
        return objectType.IsValueType ? Activator.CreateInstance(objectType) : null;
    }

    public override void WriteJson(JsonWriter writer, object value, JsonSerializer serializer)
    {
        serializer.Serialize(writer, value);
    }
}

Answer 1

问题在于访问0的索引soup.find_all("Section",class_="js-infscroll-load-more-here"‌)处的元素，因为结果是一个空列表。

Answer 2

html没有大写标记的概念，即使在源本身也是 section 而不是 Section 的小写s：

section = soup.find_all("section",class_="js-infscroll-load-more-here")[0]

由于只有一个，你也可以使用find：

 section = soup.find("section",class_="js-infscroll-load-more-here")

两者都能找到你想要的东西。

Python 3.5.2 web-scraping - 列表索引超出范围

2 个答案: