beautifulsoup4没有找到HTML

时间:2017-04-11 08:02:55

标签: python beautifulsoup

我正在使用beautifulsoup来抓取聊天消息,但是当提示打印时,输出none并退出代码0.我做错了什么?

# import libraries, pip install beautifulsoup4.
import urllib2
from bs4 import BeautifulSoup
import csv
from datetime import datetime

quote_page = 
'https://robertsspaceindustries.com/spectrum/community/SC/lobby/8'

#finding
page = urllib2.urlopen(quote_page)
soup = BeautifulSoup(page, 'html.parser')
name = soup.find('messages-items', attrs={'message-item status-default': 
'content'})
print name

#logging
with open('index.csv', 'a') as csv_file:
    writer = csv.writer(csv_file)
    writer.writerow([name, datetime.now()])

1 个答案:

答案 0 :(得分:1)

如果您在选择Chrome网络工具或Firebug时密切关注,您会发现您的网站要求网络服务提取您想要的数据。

您需要使用三个参数模拟帖子:

  • before这是收到新邮件的最后一个ID;
  • lobby_id这是您要获取的当前大厅;
  • size这是要获取的消息数量

它将返回一个json对象,您只需在其中解析即可获得所需的结果;

以下是一个例子:

import requests
import json

response = requests.post('https://robertsspaceindustries.com/api/spectrum/message/history', data = {'before': None, 'lobby_id':'8', 'size':'50'})
lobby_data = json.loads(response.content.decode("utf-8"))

for comment in lobby_data["data"]["messages"]:
  print ("%s: %s" % (comment["member"]["displayname"], comment["content_state"]["blocks"][0]["text"]))

哪个输出:

Antinov: Esp since spectrum doesn't even open a new tab to view large images....
Sir Quentin Reginald Watson: write a suggestion about it
Antinov: As if CIG listens to those.
Sir Quentin Reginald Watson: you will never know if you don't try
....