Question

我试图使用lxml抓取数据。在html中我有这样的一行：

<p class="datetime is-upcomming">March 10, Tuesday, 18:00 CET</p>

由此我创建了一个看起来像这样的功能

def get_date(self):
    request = requests.get(self.url)
    tree = html.fromstring(request.content)
    theDate = tree.xpath("//p[@class='datetime is-upcomming']/text()")
    if not theDate:
        theDate = ''
    return theDate

然后我尝试使用此功能保存到JSON中：

def __dict__(self, get_streams=False):
    data = {
        'game': self.game,
        'title': self.title,
    }
    data['start_date'] = self.get_date()

为什么这会返回一个数组？为什么它是空的？

"start_date": [
""
]

是的，我已经仔细检查过，实际上有一个日期时间正在升级的课程

Answer 1

为什么这会返回一个数组？为什么它是空的？

因为这总是返回一个列表，如果找不到任何内容，则为空列表。

您的xpath可能不正确，您可以发布整个文档或在哪里找到它吗？

另外，为什么要使用__dict__代替新方法呢？ __dict__是实例属性字典，除非你知道你在做什么，否则我不会惹这个。

使用lxml将数据刮入JSON

1 个答案: