Question

我想从网站解析大HTML文本。所以我已经解析了Div，现在我想要标签中的内容，例如：

patchpage = requests.get(href)
        tree = html.fromstring(patchpage.content)
        patch_message = tree.xpath('//div[@class="messageText"]')
        for item in patch_message:
            await client.say(item.text.strip())  # This is bugging and give me error
        return await client.say(patch_message)

这是我的以下代码：

[<Element div at 0x29c4be2fa98>]

现在patch_message给了我：

public static string CheckFirstElement()
{
    var results = Driver.Instance.FindElement(By.CssSelector("firstElementSelector']"));

    if (results == null)
    {
        return "Cannot find first element";
    }

    var attr = results.GetAttribute("id");
    if (attr == null)
    {
        return "Cannot find id attribute of first element";
    }

    if (!attr.Contains("someTextIWantToFind"))
    {
        return "Cannot find the text in first element";
    }
    return null;
}

不是我真的除外：/有人可以告诉我如何将div内容解析为python吗？

Answer 1

假设您收到错误AttributeError: 'NoneType' object has no attribute 'strip'

您只需要排除无对象被剥离。

for item in patch_message:
    if item.text:
        print item.text.strip()

Answer 2

文本内容（）：

不带标记地返回元素的文本内容，包括其子元素的文本内容。

要从div列表中每个patch_message标签中提取所有文本内容，只需对每个项目使用item[0].text_content()。

tree.xpath()返回找到的元素的列表。

patch_message = tree.xpath('//div[@class="messageText"]')
        for item in patch_message:
            await client.say(item[0].text_content())
        return await client.say(patch_message)

Lxml从HTML中解析Tag中的DIV

2 个答案: