找不到美丽的汤元素

时间:2020-07-28 05:33:20

标签: python beautifulsoup

我仍然在学习使用python进行编码。我真的需要帮助才能从此网站上抓取内容:

https://www.tokopedia.com/craftdale/crossback-apron-hijau-army?src=topads

我想从评论 (乌拉山语) 容器中获取评论数据(评论时间)

enter image description here

这是网站上的HTML

<p disabled="" data-testid="txtDateGivenReviewFilter0" class="css-oals0c-unf-heading e1qvo2ff8">1 bulan lalu</p>

我尝试使用此代码获取元素

review = soup.findAll('p',class_='css-oals0c-unf-heading e1qvo2ff8') 

review= soup.findAll('p',id_='txtDateGivenReviewFilter0') 

但是结果我只得到空数据 enter image description here

有人可以解决此问题吗?非常感谢

1 个答案:

答案 0 :(得分:1)

分析网站时,网站会进行ajax调用以检索网站中的其他信息。为了获取评论信息,它会使用json负载对特定端点进行ajax调用。

import requests, json

payload = [{"operationName": "PDPReviewRatingQuery", "variables": {"productId": 353506414}, "query": "query PDPReviewRatingQuery($productId: Int!) {\n  ProductRatingQuery(productId: $productId) {\n    ratingScore\n    totalRating\n    totalRatingWithImage\n    detail {\n      rate\n      totalReviews\n      percentage\n      __typename\n    }\n    __typename\n  }\n}\n"}, {"operationName": "PDPReviewImagesQuery", "variables": {"productID": 353506414, "page": 1}, "query": "query PDPReviewImagesQuery($page: Int, $productID: Int!) {\n  ProductReviewImageListQuery(page: $page, productID: $productID) {\n    detail {\n      reviews {\n        reviewer {\n          fullName\n          profilePicture\n          __typename\n        }\n        reviewId\n        message\n        rating\n        updateTime\n        isReportable\n        __typename\n      }\n      images {\n        imageAttachmentID\n        description\n        uriThumbnail\n        uriLarge\n        reviewID\n        __typename\n      }\n      __typename\n    }\n    __typename\n  }\n}\n"}, {"operationName": "PDPReviewHelpfulQuery", "variables": {"productID": 353506414}, "query": "query PDPReviewHelpfulQuery($productID: Int!) {\n  ProductMostHelpfulReviewQuery(productId: $productID) {\n    shop {\n      shopId\n      __typename\n    }\n    list {\n      reviewId\n      message\n      productRating\n      reviewCreateTime\n      reviewCreateTimestamp\n      isReportable\n      isAnonymous\n      imageAttachments {\n        attachmentId\n        imageUrl\n        imageThumbnailUrl\n        __typename\n      }\n      user {\n        fullName\n        image\n        url\n        __typename\n      }\n      likeDislike {\n        totalLike\n        likeStatus\n        __typename\n      }\n      __typename\n    }\n    __typename\n  }\n}\n"}, {"operationName": "PDPReviewListQuery", "variables": {"page": 1, "rating": 0, "withAttachment": 0, "productID": 353506414, "perPage": 10}, "query": "query PDPReviewListQuery($productID: Int!, $page: Int!, $perPage: Int!, $rating: Int!, $withAttachment: Int!) {\n  ProductReviewListQuery(productId: $productID, page: $page, perPage: $perPage, rating: $rating, withAttachment: $withAttachment) {\n    shop {\n      shopId\n      name\n      image\n      url\n      __typename\n    }\n    list {\n      reviewId\n      message\n      productRating\n      reviewCreateTime\n      reviewCreateTimestamp\n      isReportable\n      isAnonymous\n      imageAttachments {\n        attachmentId\n        imageUrl\n        imageThumbnailUrl\n        __typename\n      }\n      reviewResponse {\n        message\n        createTime\n        __typename\n      }\n      likeDislike {\n        totalLike\n        likeStatus\n        __typename\n      }\n      user {\n        userId\n        fullName\n        image\n        url\n        __typename\n      }\n      __typename\n    }\n    __typename\n  }\n}\n"}]

res = requests.post("https://gql.tokopedia.com/", json=payload)

data = res.json()

with open("data.json", "w") as f:
    json.dump(data, f)

上面的脚本会将评论信息作为json保存到文件中。

为了获得评分分数

print(data[0]['data']['ProductRatingQuery']['ratingScore'])
``