我尝试解析针对特定主题(如公司或产品)的facebook帖子。作为示例发布来自https://www.facebook.com/search/latest/?q=facebook
我可以正确登录facebook(使用python),我也可以获得包含我要查找的帖子的页面的源代码。经过一些手动代码审查后,我发现我想要关注:
<div class="_5pbx userContent" data-ft="{"tn":"K"}">
<p>Here is the text of the post I need
</p>
</div>
所以我开始使用beautifulsoup并遵循以下代码:
soup = BeautifulSoup(pageSourceCode.content, 'html.parser')
for msg in soup.find_all('div'):
print (msg.get('class')
结果我得到了这个......
[u'hidden_elem']
有人有刮刮facebook帖子的经验吗?我只为自己和教育目的需要这个
答案 0 :(得分:1)
以下代码应该有效
soup = BeautifulSoup(pageSourceCode.content, 'html.parser')
divs = soup.find_all('div', class_="_5pbx userContent")
for div in divs:
p = div.find('p')
print(p.get_text())
答案 1 :(得分:0)
The Problem was, that the class I search for was written in an comment. So i hade firstly to search for the div upon the comment, encode it, and create a new soup object. After that was able so select the div I was searching for via the css selector.
comment = soup.select('code#u_0_11')
comment_data = comment[0].string.encode("utf-8")
soup = BeautifulSoup(comment_data, 'html.parser')
divs = soup.select('div._5pbx.userContent')
An now I could print it via for:
for div in divs:
p = div.find_all('p')
print (p[0].text.encode('utf-8')