我正在使用beautifulsoup来抓取聊天消息,但是当提示打印时,输出none并退出代码0.我做错了什么?
# import libraries, pip install beautifulsoup4.
import urllib2
from bs4 import BeautifulSoup
import csv
from datetime import datetime
quote_page =
'https://robertsspaceindustries.com/spectrum/community/SC/lobby/8'
#finding
page = urllib2.urlopen(quote_page)
soup = BeautifulSoup(page, 'html.parser')
name = soup.find('messages-items', attrs={'message-item status-default':
'content'})
print name
#logging
with open('index.csv', 'a') as csv_file:
writer = csv.writer(csv_file)
writer.writerow([name, datetime.now()])
答案 0 :(得分:1)
如果您在选择Chrome网络工具或Firebug时密切关注,您会发现您的网站要求网络服务提取您想要的数据。
您需要使用三个参数模拟帖子:
before
这是收到新邮件的最后一个ID; lobby_id
这是您要获取的当前大厅; size
这是要获取的消息数量它将返回一个json对象,您只需在其中解析即可获得所需的结果;
以下是一个例子:
import requests
import json
response = requests.post('https://robertsspaceindustries.com/api/spectrum/message/history', data = {'before': None, 'lobby_id':'8', 'size':'50'})
lobby_data = json.loads(response.content.decode("utf-8"))
for comment in lobby_data["data"]["messages"]:
print ("%s: %s" % (comment["member"]["displayname"], comment["content_state"]["blocks"][0]["text"]))
哪个输出:
Antinov: Esp since spectrum doesn't even open a new tab to view large images....
Sir Quentin Reginald Watson: write a suggestion about it
Antinov: As if CIG listens to those.
Sir Quentin Reginald Watson: you will never know if you don't try
....