BeautifulSoup python ... soup.find(id =“ productTitle”)不返回任何内容

时间:2019-12-08 04:54:55

标签: python beautifulsoup

我是网络爬虫的新手,想从亚马逊那里获取一些信息。我已经写了这几行基本行,但是它们行不通...

[root@localhost]# openssl dh -in dhp.pem -text
    DH Parameters: (1024 bit)
        prime:
            00:94:b4:12:21:6d:42:b9:e3:1a:15:de:0a:ee:b6:
            4b:41:fa:8f:de:44:1e:ea:a2:a2:9c:b2:28:47:19:
            88:f8:65:0a:e6:09:58:c3:69:69:b4:d5:d4:d2:b5:
            21:4d:1f:9b:a9:78:58:37:94:0f:6e:51:00:62:e5:
            d2:44:53:36:72:99:1f:22:fc:a3:93:ab:3a:e8:3f:
            7b:1b:49:36:82:1c:c3:35:4b:ef:43:f9:d4:1d:6c:
            ee:8b:8d:d1:a5:8f:55:3f:db:68:c1:2f:c2:3f:9b:
            31:f9:7c:01:5f:55:19:b4:3d:58:ff:32:a3:de:a7:
            62:cd:ea:28:c8:63:32:53:23
        generator: 2 (0x2)
-----BEGIN DH PARAMETERS-----
MIGHAoGBAJS0EiFtQrnjGhXeCu62S0H6j95EHuqiopyyKEcZiPhlCuYJWMNpabTV
1NK1IU0fm6l4WDeUD25RAGLl0kRTNnKZHyL8o5OrOug/extJNoIcwzVL70P51B1s
7ouN0aWPVT/baMEvwj+bMfl8AV9VGbQ9WP8yo96nYs3qKMhjMlMjAgEC
-----END DH PARAMETERS-----
[root@localhost]# xxd dhp.pem
0000000: 2d2d 2d2d 2d42 4547 494e 2044 4820 5041  -----BEGIN DH PA
0000010: 5241 4d45 5445 5253 2d2d 2d2d 2d0a 4d49  RAMETERS-----.MI
0000020: 4748 416f 4742 414a 5330 4569 4674 5172  GHAoGBAJS0EiFtQr
0000030: 6e6a 4768 5865 4375 3632 5330 4836 6a39  njGhXeCu62S0H6j9
0000040: 3545 4875 7169 6f70 7979 4b45 635a 6950  5EHuqiopyyKEcZiP
0000050: 686c 4375 594a 574d 4e70 6162 5456 0a31  hlCuYJWMNpabTV.1
0000060: 4e4b 3149 5530 666d 366c 3457 4465 5544  NK1IU0fm6l4WDeUD
0000070: 3235 5241 474c 6c30 6b52 544e 6e4b 5a48  25RAGLl0kRTNnKZH
0000080: 794c 386f 354f 724f 7567 2f65 7874 4a4e  yL8o5OrOug/extJN
0000090: 6f49 6377 7a56 4c37 3050 3531 4231 730a  oIcwzVL70P51B1s.
00000a0: 376f 754e 3061 5750 5654 2f62 614d 4576  7ouN0aWPVT/baMEv
00000b0: 776a 2b62 4d66 6c38 4156 3956 4762 5139  wj+bMfl8AV9VGbQ9
00000c0: 5750 3879 6f39 366e 5973 3371 4b4d 686a  WP8yo96nYs3qKMhj
00000d0: 4d6c 4d6a 4167 4543 0a2d 2d2d 2d2d 454e  MlMjAgEC.-----EN
00000e0: 4420 4448 2050 4152 414d 4554 4552 532d  D DH PARAMETERS-
00000f0: 2d2d 2d2d 0a                             ----.

但是html文件显然具有以下部分:

import requests
from bs4 import BeautifulSoup

URL ='https://www.amazon.ca/Monkey-Biscuits-14-oz-Orange/dp/B074SYBXLG/'

headers= {'User-Agent':  '...myuseragent' }

page = requests.get(URL, headers=headers)

soup = BeautifulSoup(page.content,"html.parser")
print( soup.find(id="productTitle") ) 

任何帮助将不胜感激

2 个答案:

答案 0 :(得分:0)

html文档中的某些标记可​​能由javascript动态生成。
通过使用BeautifulSoup,您只能废弃静态元素。 使用硒会让您摆脱麻烦。
https://selenium-python.readthedocs.io/api.html

答案 1 :(得分:0)

state = {trip: date: ""}; changevalueDate = e => { let trip = this.state.trip; trip.date = e.target.value; this.setState({ trip }); }; <TextField label="Start Date" name="Date" InputLabelProps={{ shrink: true, required: true }} type="date" onChange={e => this.changevalueDate(e)} /> 似乎很难找到它,但是如果我使用"html.parser"可以正常工作-但这可能意味着您必须安装模块"lxml"

lxml

编辑soup = BeautifulSoup(page.content, "lxml") print(soup.find(id="productTitle").get_text(strip=True)) (如果已安装)也可以使用

"html5lib"