如何打印BeautifulSoup收集的数据?

时间:2019-04-24 04:14:49

标签: python web-scraping beautifulsoup screen-scraping

以前没有Python经验,所以这可能是非常基本的。

我正在尝试记录加拿大零售商SportChek出售的所有曲棍球杆的名称和价格。

到目前为止,我的代码如下:

 # Import libraries
import requests
from bs4 import BeautifulSoup

# Collect the page
page = requests.get('https://www.sportchek.ca/categories/shop-by-sport/hockey/hockey-sticks.html?cid=search-hockey-sticks')

# Create BeautifulSoup object
soup = BeautifulSoup(page.text, 'html.parser')

# Pull all text from product-title-text class
stick_name_list = soup.find_all(class_='product-title-text')

# Pull all text from product-price-text
stick_price_list = soup.find_all(class_='product-price-text')

我相信这段代码应该收集适当的数据,但是我不确定现在如何显示变量。

使用变量名称(即“ stick_name_list”)返回“ []”,“ print stick_name_list”要求输入括号,但显然“ print'stick_name_list'”是不正确的。

任何指导表示赞赏。

4 个答案:

答案 0 :(得分:1)

看起来像那个网站,

https://www.sportchek.ca/categories/shop-by-sport/hockey/hockey-sticks.html?cid=search-hockey-sticks

使用JavaScript加载产品数据,因此当> Task :react-native-dialogs:compileReleaseJavaWithJavac Note: /home/maelfosso/Documents/Projects/Apps/Guitou/Customers/jnadia40/ASQQuestionnaire/node_modules/react-native-dialogs/android/src/main/java/com/aakashns/reactnativedialogs/modules/DialogAndroid.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. > Task :react-native-fs:compileReleaseJavaWithJavac Note: /home/maelfosso/Documents/Projects/Apps/Guitou/Customers/jnadia40/ASQQuestionnaire/node_modules/react-native-fs/android/src/main/java/com/rnfs/RNFSManager.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. > Task :react-native-gesture-handler:compileReleaseJavaWithJavac Note: /home/maelfosso/Documents/Projects/Apps/Guitou/Customers/jnadia40/ASQQuestionnaire/node_modules/react-native-gesture-handler/android/src/main/java/com/swmansion/gesturehandler/react/RNGestureHandlerButtonViewManager.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. Note: Some input files use unchecked or unsafe operations. Note: Recompile with -Xlint:unchecked for details. > Task :realm:forwardDebugPort adb: error: no devices/emulators found =========================================================================== WARNING: Failed to automatically forward port 8083. In order to use Realm in Chrome debugging mode, port 8083 must be forwarded from localhost to the device or emulator being used to run the application. You may need to add the appropriate flags to the command that failed: adb forward tcp:8083 tcp:8083 =========================================================================== > Task :app:bundleReleaseJsAndAssets warning: the transform cache was reset. Loading dependency graph, done. info Writing bundle output to:, /home/maelfosso/Documents/Projects/Apps/Guitou/Customers/jnadia40/ASQQuestionnaire/android/app/build/generated/assets/react/release/index.android.bundle info Done writing bundle output info Copying 6 asset files info Done copying assets > Task :app:processReleaseResources FAILED FAILURE: Build failed with an exception. * What went wrong: Execution failed for task ':app:processReleaseResources'. > Android resource linking failed /home/maelfosso/Documents/Projects/Apps/Guitou/Customers/jnadia40/ASQQuestionnaire/android/app/build/intermediates/merged_manifests/release/AndroidManifest.xml:15: AAPT: error: resource mipmap/ic_launcher (aka com.asqquestionnaire:mipmap/ic_launcher) not found. /home/maelfosso/Documents/Projects/Apps/Guitou/Customers/jnadia40/ASQQuestionnaire/android/app/build/intermediates/merged_manifests/release/AndroidManifest.xml:15: AAPT: error: resource string/app_name (aka com.asqquestionnaire:string/app_name) not found. /home/maelfosso/Documents/Projects/Apps/Guitou/Customers/jnadia40/ASQQuestionnaire/android/app/build/intermediates/merged_manifests/release/AndroidManifest.xml:15: AAPT: error: resource mipmap/ic_launcher_round (aka com.asqquestionnaire:mipmap/ic_launcher_round) not found. /home/maelfosso/Documents/Projects/Apps/Guitou/Customers/jnadia40/ASQQuestionnaire/android/app/build/intermediates/merged_manifests/release/AndroidManifest.xml:15: AAPT: error: resource style/AppTheme (aka com.asqquestionnaire:style/AppTheme) not found. /home/maelfosso/Documents/Projects/Apps/Guitou/Customers/jnadia40/ASQQuestionnaire/android/app/build/intermediates/merged_manifests/release/AndroidManifest.xml:23: AAPT: error: resource string/app_name (aka com.asqquestionnaire:string/app_name) not found. error: failed processing manifest. * Try: Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output. Run with --scan to get full insights. * Get more help at https://help.gradle.org Deprecated Gradle features were used in this build, making it incompatible with Gradle 5.0. Use '--warning-mode all' to show the individual deprecation warnings. See https://docs.gradle.org/4.10.2/userguide/command_line_interface.html#sec:command_line_warnings BUILD FAILED in 3m 10s 70 actionable tasks: 64 executed, 6 up-to-date 获得html时,就没有要解析的产品。

如果在浏览器中禁用JavaScript,您将看到不存在类别为requests.getproduct-title-text的html标签。

更多信息在这里:

Using python Requests with javascript pages

答案 1 :(得分:1)

我建议您看看是否可以解析网页上的JSON。这里的更多信息:Laracasts

答案 2 :(得分:1)

您可以使用页面用于更新内容的相同URL。您可以在“网络”标签中找到它。它返回json,您可以根据==产品类型对其进行过滤以获取曲棍球棒。您可以在url查询字符串中更改count参数,以带回更多结果。

import requests
import pandas as pd

data = requests.get('https://www.sportchek.ca/services/sportchek/search-and-promote/products?x1=c.category-level-1&q1=Gear&x2=c.category-level-2&q2=Hockey&x3=c.category-level-3&q3=Hockey+Sticks&preselectedCategoriesNumber=3&preselectedBrandsNumber=0&page=1&count=100').json()

titles, prices = zip(*[(item['title'], item['price']) for item in data['products'] if item['type'] == 'product'])
df = pd.DataFrame([(item['title'], item['price']) for item in data['products'] if item['type'] == 'product'], columns = ['title', 'price'])
print(df.head())

df.head()

enter image description here

答案 3 :(得分:1)

正如其他人所说,您可以直接获取json(而不是解析它)

import requests
import math
from pandas.io.json import json_normalize


url = 'https://www.sportchek.ca/services/sportchek/search-and-promote/products'

headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36'}

payload = {
'x1': 'c.category-level-1',
'q1': 'Gear',
'x2': 'c.category-level-2',
'q2': 'Hockey',
'x3': 'c.category-level-3',
'q3': 'Hockey Sticks',
'preselectedCategoriesNumber': '3',
'preselectedBrandsNumber': '0',
'page': '1',
'count': '200'}




jsonData = requests.get(url, headers=headers, params=payload).json()
total_products = jsonData['resultCount']['total']
total_pages = math.ceil(total_products / 200)

for page in range(2, total_pages+1):
    payload = {
            'x1': 'c.category-level-1',
            'q1': 'Gear',
            'x2': 'c.category-level-2',
            'q2': 'Hockey',
            'x3': 'c.category-level-3',
            'q3': 'Hockey Sticks',
            'preselectedCategoriesNumber': '3',
            'preselectedBrandsNumber': '0',
            'page': page,
            'count': '200'}

    products = requests.get(url, headers=headers, params=payload).json()['products']
    jsonData['products'] = jsonData['products'] + products
    print ('Processed page: %s' %page)

df = json_normalize(jsonData['products'])

您可以以任何方式操纵该表,或者直接在json文件中工作。我只是将其转换为表格。

输出:

print (df[['title', 'price']])
                                                 title   price
0    Bauer Supreme 1S Griptac Senior Hockey Stick -...  339.99
1       Warrior Covert QRL SE Grip Senior Hockey Stick  329.99
2    Bauer Vapor X600 Lite Griptac Senior Hockey Stick   69.99
3                                           Gift Cards     NaN
4    Bauer Supreme 1S Clear Senior Hockey Stick - G...  339.99
5      Bauer Vapor 1X Lite Griptac Senior Hockey Stick  339.99
6    Bauer NEXUS 1N Griptac Gen II Senior Hockey Stick  254.97
7                                           Flash Sale     NaN
8                           Sher-Wood Project 9 Sticks     NaN
9    Bauer Supreme 2S Team Griptac Senior Hockey Stick  159.99
10   Bauer Supreme S160 Griptac Junior Hockey Stick...   44.97
11              Bauer Nexus 2N Pro Senior Hockey Stick  319.99
12     Warrior Alpha QX Grip Intermediate Hockey Stick  184.88
13               TRUE XC5 ACF Grip Junior Hockey Stick   79.99
14     Warrior Covert QRE ST2 Grip Senior Hockey Stick   89.99
15                             Mother's Day Gift Guide     NaN
16   Bauer Supreme S190 Griptac Senior Hockey Stick...  156.97
17   Bauer Vapor X700 Lite Griptac Senior Hockey Stick  119.99
18    Bauer Supreme 2S Pro Griptac Senior Hockey Stick  319.99
19              Bauer Nexus 2N Pro Junior Hockey Stick  199.99
20   Bauer Vapor 1X Lite Griptac Intermediate Hocke...  319.99
21   Bauer Supreme 1S Griptac Intermediate Hockey S...  223.97
22        Bauer Nexus 2N Pro Intermediate Hockey Stick  299.99
23            TRUE XC9 ACF Grip Junior 30 Hockey Stick  119.99
24                  TRUE XC9 ACF Youth 20 Hockey Stick   99.99
25                  Bauer Nexus 2N Senior Hockey Stick  224.99
26        Bauer Supreme 1S Youth Hockey Stick - Gen II   69.97
27        TRUE XC9 ACF Grip Gen II Senior Hockey Stick  319.99
28    Bauer Supreme 2S Pro Griptac Junior Hockey Stick  199.99
29   Bauer NEXUS N7000 Griptac Gen II Intermediate ...   89.97
..                                                 ...     ...
408        Warrior Covert QRL Grip Senior Hockey Stick  159.97
409  Bauer Vapor X800 Griptac Gen II Senior Hockey ...  109.97
410  Graf G95 Revolt Grip Senior Hockey Stick - GP0...  109.88
411            CCM Ribcor 47K Grip Senior Hockey Stick   79.97
412         Sher-Wood BPM 060 Grip Senior Hockey Stick   51.97
413        CCM RBZ Revolution Grip Senior Hockey Stick  149.88
414  CCM Premier R1.5 Senior Goalie Stick - Crawfor...   89.97
415       Bauer Vapor 1X Senior Goalie Stick - P31 25"  289.99
416     Sher-Wood GS350 Senior Goalie Stick 24" - PP41   96.97
417     Sher-Wood GS350 Senior Goalie Stick - PP41 27"   96.97
418     Bauer Vapor X900 Senior Goalie Stick - P31 26"  199.99
419          Sher-Wood GS150 Senior Goalie Stick - 24"   74.97
420          Sher-Wood GS150 Senior Goalie Stick - 25"   74.97
421           CCM 1060 Senior Goalie Stick - Price 27"   89.88
422          Sher-Wood GS150 Senior Goalie Stick - 26"   74.97
423          Sher-Wood GS150 Senior Goalie Stick - 27"   74.97
424  CCM Premier R1.9 Senior Goalie Stick - Crawfor...  119.97
425   Sher-Wood BPM 090 Grip Intermediate Hockey Stick   81.97
426  Warrior Covert QRL5 Grip Intermediate Hockey S...   63.97
427  Warrior Covert DT1 LT Grip Intermediate Hockey...  111.88
428  Warrior Covert Super Dolomite Grip Intermediat...  189.88
429  Warrior Dynasty HD1 Intermediate Stick - Grip ...  123.88
430  Easton Stealth CX Grip Intermediate Hockey Sti...  159.88
431  Easton Synergy 20 Intermediate Stick - Grip - ...   34.88
432  Sherwood T120 Intermediate Grip Hockey Stick -...   99.97
433  GRAF G75 Intermediate 70 Flex Hockey Stick - GP22   99.88
434  Bauer Vapor X700 Griptac Gen II Intermediate H...   79.97
435  Easton Synergy HTX Intermediate Stick - Grip -...  115.88
436  Sherwood T120 Intermediate Grip Hockey Stick -...   99.97
437   Sher-Wood BPM 060 Grip Intermediate Hockey Stick   51.97

[438 rows x 2 columns]