使用BeautifulSoup结果为空

时间:2016-10-24 01:58:17

标签: twitter beautifulsoup no-response

我正在尝试解析twitter所需的输出是推文的URL,推文的日期,发件人和twit本身。没有错误,但结果是空的。我无法找到代码所在的问题:如果你可以帮助我,那将是很好的,因此我将使用我的论文中的数据

from bs4 import BeautifulSoup
import urllib.request
import openpyxl
wb= openpyxl.load_workbook('dene1.xlsx')
sheet=wb.get_sheet_by_name('Sayfa1')
headers = {}
headers['User-Agent'] = "Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.27 Safari/537.17"
url = 'https://twitter.com/search?q=TURKCELL%20lang%3Atr%20since%3A2012-01-01%20until%3A2012-01-09&src=typd&lang=tr'
req = urllib.request.Request(url, headers = headers)
resp = urllib.request.urlopen(req)
respData = resp.read()
soup = BeautifulSoup(respData , 'html.parser')
gdata = soup.find_all("div", {"class": "content"})
for item in gdata:
    try:
        items2 = item.find('a', {'class': 'tweet-timestamp js-permalink js-nav js-tooltip'})
        items21=items2.get('href')
        items22=items2.get('title')
    except:
        pass
    try:
        items1 = item.find('span', {'class': 'username js-action-profile-name'}).text
    except:
        pass
    try:
        items3 = item.find('p', {'class': 'TweetTextSize js-tweet-text tweet-text'}).text
        sheet1=sheet.append([items21, items22,items1,items3])
    except:
        pass
wb.save('dene1.xlsx')

问候

1 个答案:

答案 0 :(得分:0)

你的例外中的每一行都会导致错误至少一次,你永远不会看到它们,因为你使用空白的例外来实际捕获每一个例外:

import urllib.request
from bs4 import BeautifulSoup


headers = {
    'User-Agent': "Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.27 Safari/537.17"}

url = 'https://twitter.com/search?q=TURKCELL%20lang%3Atr%20since%3A2012-01-01%20until%3A2012-01-09&src=typd&lang=tr'
req = urllib.request.Request(url, headers = headers)
resp = urllib.request.urlopen(req)
respData = resp.read()

soup = BeautifulSoup(respData, 'html.parser')
gdata = soup.find_all("div", {"class": "content"})
for item in gdata:
    items2 = item.find('a', {'class': 'tweet-timestamp js-permalink js-nav js-tooltip'}, href=True)
    if items2:
        items21 = items2.get('href')
        items22 = items2.get('title')
        print(items21)
        print(items22)
    items1 = item.find('span', {'class': 'username js-action-profile-name'})
    if items1:
        print(items1.text)
    items3 = item.find('p', {'class': 'TweetTextSize js-tweet-text tweet-text'})
    if items3:
        print(items3.text)

现在你可以看到很多输出。