Question

我正在使用beautifulsoup来抓取聊天消息，但是当提示打印时，输出none并退出代码0.我做错了什么？

# import libraries, pip install beautifulsoup4.
import urllib2
from bs4 import BeautifulSoup
import csv
from datetime import datetime

quote_page = 
'https://robertsspaceindustries.com/spectrum/community/SC/lobby/8'

#finding
page = urllib2.urlopen(quote_page)
soup = BeautifulSoup(page, 'html.parser')
name = soup.find('messages-items', attrs={'message-item status-default': 
'content'})
print name

#logging
with open('index.csv', 'a') as csv_file:
    writer = csv.writer(csv_file)
    writer.writerow([name, datetime.now()])

Answer 1

如果您在选择Chrome网络工具或Firebug时密切关注，您会发现您的网站要求网络服务提取您想要的数据。

您需要使用三个参数模拟帖子：

before这是收到新邮件的最后一个ID;
lobby_id这是您要获取的当前大厅;
size这是要获取的消息数量

它将返回一个json对象，您只需在其中解析即可获得所需的结果;

以下是一个例子：

import requests
import json

response = requests.post('https://robertsspaceindustries.com/api/spectrum/message/history', data = {'before': None, 'lobby_id':'8', 'size':'50'})
lobby_data = json.loads(response.content.decode("utf-8"))

for comment in lobby_data["data"]["messages"]:
  print ("%s: %s" % (comment["member"]["displayname"], comment["content_state"]["blocks"][0]["text"]))

哪个输出：

Antinov: Esp since spectrum doesn't even open a new tab to view large images....
Sir Quentin Reginald Watson: write a suggestion about it
Antinov: As if CIG listens to those.
Sir Quentin Reginald Watson: you will never know if you don't try
....

beautifulsoup4没有找到HTML

1 个答案: