如何用beautifulsoup提取评论?

时间:2017-12-20 15:07:13

标签: python beautifulsoup text-mining

我是python和数据挖掘的新手,所以我有一个关于从输出中提取零件的问题。我在3.6中使用Python,并在今天早上更新了所有内容。我已对输出进行了匿名处理,并删除了包含密码,令牌等所有行。

from bs4 import BeautifulSoup

soup = BeautifulSoup(open("facebookoutput.html"), "html.parser")

comments = soup.findAll('div', class_="_2b06")

print(comments[0]) # show print of first entry:

<div class="_2b06"><div class="_2b05"><a href="/stuartd?fref=nf&amp;rc=p&    amp;__tn__=R-R">some Name </a></div><div data-commentid="100000000000000000222222000000000000000" data-sigil="comment-body">There is nice comment. I like stackoverflow. </div></div>

我很想得到`有很好的评论。我喜欢stackoverflow.'out out it。

提前致谢。

1 个答案:

答案 0 :(得分:1)

试试这个:

import boto3
import json

def lambda_handler(event, context):

    dynamodb = boto3.resource('dynamodb', region_name='eu-central-1')

    dynamodb.putItem{
        "TableName": "myTable",
        "Item": {
            "username": {
                "S": "chicken"
            },
            "fav_food": {
                "S": "ketchup"
            }
        }
    }
    return 0

输出:

from bs4 import BeautifulSoup

content="""
<div class="_2b06"><div class="_2b05"><a href="/stuartd?fref=nf&amp;rc=p&    amp;__tn__=R-R">some Name </a></div><div data-commentid="100000000000000000222222000000000000000" data-sigil="comment-body">There is nice comment. I like stackoverflow. </div></div>
"""

soup = BeautifulSoup(content, "html.parser")
comments = ' '.join([item.text for item in soup.select("[data-sigil='comment-body']")])
print(comments)