Beautifullsoup从类中获取字符串的一部分

时间:2017-09-07 11:57:16

标签: python selenium beautifulsoup

HTML是:

<div class="_3u1 _gli _uvb" data-bt='{"id":xxxx,"rank":11,"abtest_version":null,"abtest_params":{"abtest_version":null,"origin":"A","ranker":null},"section":"main_column","owner_id":null,"sub_id":null,"browse_location":null,"query_data":[],"is_headline":false}'>

我的代码是:

for profileid in soup.find_all("div","_3u1 _gli _uvb"):
    for fbid in profileid.find_all("data-bt"):
        worksheet.write(row,0,fbid.get("id"))
        print (fbid.get("id"))
        row += 1

我得到的回报是:

 {"id":xxxxxx,"rank":1,"abtest_version":null,"abtest_params":{"abtest_version":null,"origin":"A","ranker":null},"section":"main_column","owner_id":null,"sub_id":null,"browse_location":null,"query_data":[],"is_headline":false}

我怎样才能让xxxxx返回?提前谢谢。

1 个答案:

答案 0 :(得分:1)

您可以解析data-bt,因为它包含有效的json

import json

found =  soup.find_all("div", "_3u1 _gli _uvb")

for fbid in found:
    ...
    bt_json = json.loads(fbid.attrs['data-bt'])
    print(bt_json['id'])
    ...