我的目标是在发布Twitter消息时检索它们的ID。到目前为止,我的代码如下:
import requests
from bs4 import BeautifulSoup
keys = some_key_words + " -filter:retweets AND -filter:replies"
query = "https://twitter.com/search?f=tweets&vertical=default&q=" + keys + "&src=typd&lang=es"
req = requests.get(query).text
soup = BeautifulSoup(req, "lxml")
for tweets in soup.findAll("li",{"class":"js-stream-item stream-item stream-item"}):
print(tweets)
但是,这不会返回任何内容。代码本身是否有问题,或者我在看源代码的错误位置?我知道ID应该存储在这里:
<div class="stream">
<ol class="stream-items js-navigable-stream" id="stream-items-id">
<li class="js-stream-item stream-item stream-item" **data-item-id**="1210306781806833664" id="stream-item-tweet-1210306781806833664" data-item-type="tweet">
答案 0 :(得分:0)
from bs4 import BeautifulSoup
data = """
<div class="stream">
<ol class="stream-items js-navigable-stream" id="stream-items-id">
<li class="js-stream-item stream-item stream-item
" **data-item-id**="1210306781806833664"
id="stream-item-tweet-1210306781806833664"
data-item-type="tweet"
>
...
"""
soup = BeautifulSoup(data, 'html.parser')
for item in soup.findAll("li", {'class': 'js-stream-item stream-item stream-item'}):
print(item.get("**data-item-id**"))
输出:
1210306781806833664