使用beautifulsoup / python解析公共facebook帖子

时间:2016-11-05 17:46:21

标签: python facebook web-scraping beautifulsoup

我尝试解析针对特定主题(如公司或产品)的facebook帖子。作为示例发布来自https://www.facebook.com/search/latest/?q=facebook

我可以正确登录facebook(使用python),我也可以获得包含我要查找的帖子的页面的源代码。经过一些手动代码审查后,我发现我想要关注:

<div class="_5pbx userContent" data-ft="&#123;&quot;tn&quot;:&quot;K&quot;&#125;">
    <p>Here is the text of the post I need
    </p>
</div>

所以我开始使用beautifulsoup并遵循以下代码:

soup = BeautifulSoup(pageSourceCode.content, 'html.parser')

for msg in soup.find_all('div'):
    print (msg.get('class')

结果我得到了这个......

[u'hidden_elem']

有人有刮刮facebook帖子的经验吗?我只为自己和教育目的需要这个

2 个答案:

答案 0 :(得分:1)

以下代码应该有效

soup = BeautifulSoup(pageSourceCode.content, 'html.parser')

divs = soup.find_all('div', class_="_5pbx userContent")
for div in divs:
    p = div.find('p')
    print(p.get_text())

答案 1 :(得分:0)

The Problem was, that the class I search for was written in an comment. So i hade firstly to search for the div upon the comment, encode it, and create a new soup object. After that was able so select the div I was searching for via the css selector.

comment = soup.select('code#u_0_11')
comment_data = comment[0].string.encode("utf-8")
soup = BeautifulSoup(comment_data, 'html.parser')
divs = soup.select('div._5pbx.userContent')

An now I could print it via for:

for div in divs:
    p = div.find_all('p')
    print (p[0].text.encode('utf-8')