Question

我使用scrapy在twitter上提取用户信息，但我目前在使用python时提取以下内容，关注者计数等问题。

我可以使用..

成功提取id，screenname和avatar等

user['ID'] = tweet['user_id']
user['name'] = item.xpath('.//@data-name').extract()[0]
user['screen_name'] = item.xpath('.//@data-screen-name').extract()[0]
user['avatar'] = item.xpath('.//div[@class="content"]/div[@class="stream-item-header"]/a/img/@src').extract()[0]

twitter html

遗憾的是，我有问题从用户的'跟随'html中提取属性数，因为我不知道正确的xpath来提取数据或者是否可能......

我可以使用以下代码使用java脚本成功提取计数，但在python中存在问题。

following   = $new.find('.ProfileNav-item--following .ProfileNav-value').first().text();

任何帮助和建议都会很精彩。谢谢

没有javascript的推特的图片 twitter without javascript

Answer 1

你需要检查你是否有你想要的元素，因为你的刮刀下载的页面没有使用javascript渲染的元素。您可以使用Array ( [CF.{Temps}] => 1 [CF.{Etat}] => return [CF.{Code}] => 2 [CF.{Values}] => plaque [CF.{Coordonnees}] => LA PERSONNE 10000 LA VILLE 0500235689 0645788923 Login : test@mail.com Password : PassWord! [CF.{Groupe}] => 3 [CF.{Date}] => 4 )进行检查（here是一个包含scrapy shell信息的链接）。您还可以使用此addon或类似的选项卡查找css选择器。除了xpath，你可以使用带scrapy的css选择器 scrapy shell

Python Twitter scrapy用于提取twitter跟随者，关注者数量等

1 个答案: