我可以从移动Twitter网站上抓取网页吗?

时间:2019-04-16 02:16:33

标签: python selenium web beautifulsoup screen-scraping

因此,我尝试从移动Twitter网站https://mobile.twitter.com/CocaCola进行网络爬虫,因为在这里我可以看到每个人喜欢的推文,而在桌面Twitter视图中,我只能看到数量有限的喜欢推文的人。但是,当我尝试提取信息时,即使它应该返回信息也为None。

import re
import requests
import urllib
from selenium import webdriver
from selenium.webdriver.support.ui import Select
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.keys import Keys
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import NoSuchElementException
from selenium.common.exceptions import NoAlertPresentException
from bs4 import BeautifulSoup
import sys
import unittest, time
url = ["https://mobile.twitter.com/CocaCola"]
for x in url:
   d = webdriver.Chrome()
   actions = ActionChains(d)
   d.get(x)
   res = requests.get(x)
   page = urllib.urlopen(x)
   numb = 0;
   SCROLL_PAUSE_TIME = 0.5
# Get scroll height
   last_height = d.execute_script("return document.body.scrollHeight")
   while True:
    # Scroll down to bottom
      d.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    # Wait to load page
      time.sleep(SCROLL_PAUSE_TIME)
    # Calculate new scroll height and compare with last scroll height
      new_height = d.execute_script("return document.body.scrollHeight")
      soup = BeautifulSoup(page, "html.parser")
      print(soup.findAll('div',{"class":"css-1dbjc4n r-1iusvr4 r-46vdb2 r-5f2r5o r-bcqeeo"}))

我目前正在尝试提取每条推文的内容。

0 个答案:

没有答案