用json爬行的python美丽的汤网

时间:2017-04-15 18:13:53

标签: python json beautifulsoup request web-crawler

我是python中的beautifulsoup的新手,我试图从网站中提取某些信息。详细地说,网址和标题。

我使用beautifulsoup来提取我成功做过的json,但我不确定接下来的步骤,如何获取网址和标题

我还没有设法提取所需的信息。我希望你们能帮帮我

到目前为止,这是我的逻辑:

b'{"searchResults":{"customer":null,"signupUrl":"\\/signup\\/?pos=activityCard","isMobile":false,"tours":[{"tourId":5459,"title":"Ticket f\\u00fcr Coca-Cola London Eye 4D-Erlebnis","url":"https:\\/\\/www.getyourguide.de\\/london-l57\\/ohne-anstehen-edf-london-eye-4d-erlebnis-t5459\\/","price":{"original":"27,10\\u00a0\\u20ac","min":"27,10\\u00a0\\u20ac","type":"individual"},"horizontalImageUrl":"https:\\/\\/cdn.getyourguide.com\\/img\\/tour_img-412120-70.jpg","horizontalAlternativeImageUrl":"https:\\/\\/cdn.getyourguide.com\\/img\\/tour_img-412120-85.jpg","verticalImageUrl":"https:\\/\\/cdn.getyourguide.com\\/img\\/tour_img-412120-92.jpg","mobileImageUrl":"https:\\/\\/cdn.getyourguide.com\\/img\\/tour_img-412120-53.jpg","horizontalSlimImageUrl":"https:\\/\\/cdn.getyourguide.com\\/img\\/tour_img-412120-67.jpg","highlightedDetailedImageUrl":"https:\\/\\/cdn.getyourguide.com\\/img\\/tour_img-412120-91.jpg","smallDescription":"Sehen Sie London aus einer anderen Perspektive vom London Eye aus und genie\\u00dfen Sie beim neuen 4D-Erlebnis einen bahnbrechenden 3D-Film mit\\u2026","description":"Sehen Sie London aus einer anderen Perspektive vom London Eye aus und genie\\u00dfen Sie beim neuen 4D-Erlebnis einen bahnbrechenden 3D-Film mit spektakul\\u00e4ren Spezialeffekten, einschlie\\u00dflich Wind und Nebel. Genie\\u00dfen Sie au\\u00dferdem bevorzugten Einlass am Eingang.","isBestseller":false,"isFeatured":false,"languageIds":[],"hasDeal":false,"dealMaxPercentage":0,"isBoostedNewTour":false,"hasBanner":false,"hasRibbon":false,"priceTag":true,"detailsLink":false,"isCertifiedPartner":true,"hasFencedDiscountDeal":false,"hasFreeCancellation":false,"hasRating":true,"averageRating":"4,5","totalRating":1633,"totalRatingTitle":"1633 Bewertungen","averageRatingClass":"45","ratingLink":"","ratingStyleModifier":"","ratingStarsClasses":"","ratingTitle":"Bewertung: 4,5 von 5","hasDuration":true,"duration":"40 Minuten","displayAbstract":true,"displayDuration":true,"displayDate":false,"displayWishlist":false,"displayRemoveButton":false,"hasDiscountedRecommendation":false,"hideImage":false,"isSkipTheLine":false,"likelyToSellOutBadge":true,"isPromoted":false,"isSpecialOffer":false,"experiments":{"hasRatingsExperiment":false,"numericRatingLabel":"Basierend auf 1633 Bewertungen","verticalImageForPriceSegmentation":"https:\\/\\/cdn.getyourguide.com\\/img\\/tour_img-412120-150.jpg"},"id":"searchResults","activityCardVersion":"horizontal","limit":false,"likelyToSellOutExperiment":{"deviceDetector":{}},"hasNumericReviews":true,"resultSetPosition":0,"activityCardStyle":"plain","highlightedOrientation":"horizontal"},{"tourId":51268,"title":"Bustransfer: Flughafen Stansted - Stadtzentrum London","url":"https:\\/\\/www.getyourguide.de\\/london-l57\\/bustransfer-flughafen-stansted-stadtzentrum-london-t51268\\/","price":{"original":"9,43\\u00a0\\u20ac","min":"9,43\\u00a0\\u20ac","type":"individual"},"horizontalImageUrl":"https:\\/\\/cdn.getyourguide.com\\/img\\/tour_img-451822-70.jpg","horizontalAlternativeImageUrl":"https:\\/\\/cdn.getyourguide.com\\/img\\/tour_img-451822-85.jpg","verticalImageUrl":"https:\\/\\/cdn.getyourguide.com\\/img\\/tour_img-451822-92.jpg","mobileImageUrl":"https:\\/\\/cdn.getyourguide.com\\/img\\/tour_img-451822-53.jpg","horizontalSlimImageUrl":"https:\\/\\/cdn.getyourguide.com\\/img\\/tour_img-451822-67.jpg","highlightedDetailedImageUrl":"https:\\/\\/cdn.getyourguide.com\\/img\\/tour_img-451822-91.jpg","smallDescription":"Beginnen oder beenden Sie Ihren Aufenthalt in London mit dem praktischen Bustransfer zwischen dem Flughafen Stansted und dem Stadtzentrum London.\\u2026","description":"Beginnen oder beenden Sie Ihren Aufenthalt in London mit dem praktischen Bustransfer zwischen dem Flughafen Stansted und dem Stadtzentrum London. Sparen Sie sich die Fahrt mit \\u00f6ffentlichen Verkehrsmitteln und erreichen Sie London schnell und bequem.","isBestseller":false,"isFeatured":false,"languageIds":[],"hasDeal":false,"dealMaxPercentage":0,"isBoostedNewTour":false,"hasBanner":false,"hasRibbon":false,"priceTag":true,"detailsLink":false,"isCertifiedPartner":false,"hasFencedDiscountDeal":false,"hasFreeCancellation":true,"hasRating":true,"averageRating":"4,4","totalRating":541,"totalRatingTitle":"541 Bewertungen","averageRatingClass":"45","ratingLink":"","ratingStyleModifier":"","ratingStarsClasses":"","ratingTitle":"Bewertung: 4,4 von 5","hasDuration":true,"duration":"60 Minuten \\u2013 90 Minuten","displayAbstract":true,"displayDuration":true,"displayDate":false,"displayWishlist":false,"displayRemoveButton":false,"hasDiscountedRecommendation":false,"hideImage":false,"isSkipTheLine":false,"likelyToSellOutBadge":true,"isPromoted":false,"isSpecialOffer":false,"experiments":{"hasRatingsExperiment":false,"numericRatingLabel":"Basierend auf 541 Bewertungen","verticalImageForPriceSegmentation":"https:\\/\\/cdn.getyourguide.com\\/img\\/tour_img-451822-150.jpg"}

这就是输出:

title":"Ticket f\\u00fcr Coca-Cola London Eye 4D-Erlebnis","url":"https:\\/\\/www.getyourguide.de\\/london-l57\\/ohne-anstehen-edf-london-eye-4d-erlebnis-t5459

我想要的是标题和网址。例如:

js_dict = (json.loads(response.content.decode('utf-8')))

url = (js_dict['searchResults']["tours"][0]["url"])
print(url)

title = (js_dict['searchResults']["tours"][0]["title"])
print(title)

price = (js_dict['searchResults']["tours"][0]["price"]["original"])
print(price)

非常感谢任何反馈

更新

感谢我的反馈,我能够解决问题。

我现在能够获得所需的结果,但现在我遇到的问题是我只得到一个结果而不是所有结果:

https://www.citydis.de/london-l57/ohne-anstehen-edf-london-eye-4d-erlebnis-t5459/
Ticket für Coca-Cola London Eye 4D-Erlebnis
27,10 €

输出如下:

jsonUrl = "https://www.citydis.com/s/results.json?&q=London& customerSearch=1&page=0"
headers.update({'X-Csrf-Token': csrf})
response = session.get(jsonUrl, headers=headers)
js_dict = (json.loads(response.content.decode('utf-8')))

for item in js_dict:
    headers = js_dict['searchResults']["tours"]
    prices = js_dict['searchResults']["tours"]
    urls = js_dict['searchResults']["tours"]


for title, price, url in zip(headers, prices, urls):

    title_final = title.get("title")
    url_final = url.get("url")
    price_final = price.get("price")["original"]
    print("Header: " + title_final + " | " + "Deeplink: " + url_final + " | " + "Price: " + price_final)

我想获得JSON中观光的所有标题,价格和网址。我尝试使用for循环但不知何故它不起作用。

任何反馈意见

更新2

找到解决方案:

{{1}}

1 个答案:

答案 0 :(得分:2)

字符串response.content确实是JSON输出。您可以导入json模块,并使用类似

的语句解析JSON
js_dict = json.loads(response.content)

这将解析JSON并在js_dict中生成一个Python字典。然后,您可以使用标准字典下标技术来访问和显示感兴趣的字段。

因为这是一个常见的要求,所以响应对象有一个json方法,可以为您解码。因此,你可以简单地写

js_dict = response.json()