我正在尝试从以下网站中提取数据:https://www.bigschedules.com/ 当我手动执行此操作时,该功能正常运行。
我已经在Python中使用Selenium和Chromedriver开发了一个脚本,以前曾经可以正常工作,但是现在,它显示错误“ WebSocket握手期间错误:意外的响应代码:200 。
该脚本打开chrome并尝试从网站获取数据,但被卡住,如下图所示: [点击此处查看图片] [1]
[1]:https://i.stack.imgur.com/0JxEi.png enter code here
我正在使用chromedriver版本2.42,Selenium版本3.14
def setupChrome(self):
# Contains all chrome settings
self.logger.info("Setting-up Chrome")
self.settings = webdriver.ChromeOptions()
#self.settings.add_argument("--incognito")
self.settings.add_argument('--ignore-ssl-errors')
self.settings.add_argument('--ignore-certificate-errors')
self.settings.add_argument('–-disable-web-security')
self.settings.add_argument('–-allow-running-insecure-content')
def loadBrowser(self):
self.setupChrome()
try:
self.browser = webdriver.Chrome(chrome_options=self.settings,
executable_path="D:\\chromedriver.exe")
self.browser.maximize_window()
&我在控制台堆栈中遇到以下错误:
webtrends.js:1 **A parser-blocking**, cross site (i.e. different eTLD+1) script, https://sdc.oocl.com/dcsg6upoljf1zldtivsnov48s_8o7d/wtid.js, is invoked via document.write. The network request for this script MAY be blocked by the browser in this or a future page load due to poor network connectivity. If blocked in this page load, it will be confirmed in a subsequent console message. See https://www.chromestatus.com/feature/5718547946799104 for more details.
WebTrends.dcsGetId @ webtrends.js:1
(anonymous) @ VM29:431
6[Intervention] **Slow network is detected**. See <URL> for more details. Fallback font will be used while loading: <URL>
application-c962374717.min.js:4
pascalprecht.translate.$translateSanitization: **No sanitization** strategy has been configured. This can have serious security implications. See http://angular-translate.github.io/docs/#/guide/19_security for details.
(anonymous) @ application-c962374717.min.js:4
warn @ application-c962374717.min.js:12
c @ angular-translate.min.js:6
sanitize @ angular-translate.min.js:6
a.interpolate @ angular-translate.min.js:6
q.instant @ angular-translate.min.js:6
n @ angular-translate.min.js:6
fn @ VM201:4
e @ angular.js:16658
P.exp @ angular.js:13144
pre @ angular.js:10436
(anonymous) @ angular.js:1385
wa @ angular.js:10545
q @ angular.js:9911
f @ angular.js:9174
q @ angular.js:9928
f @ angular.js:9174
q @ angular.js:9928
(anonymous) @ angular.js:10273
(anonymous) @ angular.js:17051
$digest @ angular.js:18233
$apply @ angular.js:18531
l @ angular.js:12547
s @ angular.js:12785
y.onload @ angular.js:12702
application-c962374717.min.js:4
Deprecation warning: **moment().add(period, number) is deprecated. Please use moment().add(number, period). See http://momentjs.com/guides/#/warnings/add-inverted-param/ for more info.**
(anonymous) @ application-c962374717.min.js:4
k @ moment-with-locales.min.js:1
T @ moment-with-locales.min.js:1
(anonymous) @ moment-with-locales.min.js:1
(anonymous) @ application-c962374717.min.js:44
invoke @ angular.js:5040
P.instance @ angular.js:11000
q @ angular.js:9865
f @ angular.js:9174
f @ angular.js:9177
f @ angular.js:9177
f @ angular.js:9177
(anonymous) @ angular.js:9039
(anonymous) @ angular.js:9430
d @ angular.js:9217
m @ angular.js:9984
(anonymous) @ angular.js:32398
(anonymous) @ angular.js:1385
(anonymous) @ angular.js:10539
wa @ angular.js:10545
q @ angular.js:9934
(anonymous) @ angular.js:10273
(anonymous) @ angular.js:17051
$digest @ angular.js:18233
$apply @ angular.js:18531
l @ angular.js:12547
s @ angular.js:12785
y.onload @ angular.js:12702
universalModuleDefinition:3
WebSocket connection to 'wss://www.bigschedules.com/socket.io/?EIO=3&transport=websocket&sid=yywiluhT_bdXDglEAAkc' failed: **Error during WebSocket handshake: Unexpected response code: 200**
n.doOpen @ universalModuleDefinition:3
n.open @ universalModuleDefinition:2
n.probe @ universalModuleDefinition:2
n.onOpen @ universalModuleDefinition:2
n.onHandshake @ universalModuleDefinition:2
n.onPacket @ universalModuleDefinition:2
(anonymous) @ universalModuleDefinition:2
n.emit @ universalModuleDefinition:2
n.onPacket @ universalModuleDefinition:2
r @ universalModuleDefinition:2
(anonymous) @ universalModuleDefinition:2
e.decodePayloadAsBinary @ universalModuleDefinition:2
e.decodePayload @ universalModuleDefinition:2
n.onData @ universalModuleDefinition:2
(anonymous) @ universalModuleDefinition:2
n.emit @ universalModuleDefinition:2
i.onData @ universalModuleDefinition:2
i.onLoad @ universalModuleDefinition:2
hasXDR.r.onreadystatechange @ universalModuleDefinition:2
application-c962374717.min.js:23 Uncaught TypeError: **Cannot assign to read only property 'tagName' of object '#<HTMLDivElement>'**
at Object.handler.tagNameHandler (application-c962374717.min.js:23)
at Object.handler.constructInfo (application-c962374717.min.js:23)
at application-c962374717.min.js:23
handler.tagNameHandler @ application-c962374717.min.js:23
handler.constructInfo @ application-c962374717.min.js:23
(anonymous) @ application-c962374717.min.js:23
4application-c962374717.min.js:23
Uncaught TypeError: **Cannot assign to read only property** 'tagName' of object '#<HTMLInputElement>'
at Object.handler.tagNameHandler (application-c962374717.min.js:23)
at Object.handler.constructInfo (application-c962374717.min.js:23)
at application-c962374717.min.js:23
handler.tagNameHandler @ application-c962374717.min.js:23
handler.constructInfo @ application-c962374717.min.js:23
(anonymous) @ application-c962374717.min.js:23
application-c962374717.min.js:23
Uncaught TypeError: **Cannot assign to read only property** 'tagName' of object '[object HTMLAnchorElement]'
at Object.handler.tagNameHandler (application-c962374717.min.js:23)
at Object.handler.constructInfo (application-c962374717.min.js:23)
at application-c962374717.min.js:23
handler.tagNameHandler @ application-c962374717.min.js:23
handler.constructInfo @ application-c962374717.min.js:23
(anonymous) @ application-c962374717.min.js:23
query:1 **Failed to load resource**: the server responded with a status of 401 (Unauthorized)
application-c962374717.min.js:23
Uncaught TypeError: **Cannot assign to read only property** 'tagName' of object '[object HTMLAnchorElement]'
at Object.handler.tagNameHandler (application-c962374717.min.js:23)
at Object.handler.constructInfo (application-c962374717.min.js:23)
at tracking (application-c962374717.min.js:23)
at firstThingAfterSearch (application-c962374717.min.js:23)
at monitor (application-c962374717.min.js:23)
at application-c962374717.min.js:23
handler.tagNameHandler @ application-c962374717.min.js:23
handler.constructInfo @ application-c962374717.min.js:23
tracking @ application-c962374717.min.js:23
firstThingAfterSearch @ application-c962374717.min.js:23
monitor @ application-c962374717.min.js:23
(anonymous) @ application-c962374717.min.js:23
setTimeout (async)
(anonymous) @ application-c962374717.min.js:23
wrappedFn @ application-c962374717.min.js:23
angular.js:12759 GET https://www.bigschedules.com/api/routeSearch/query?_=1537193893310&carrier=COSU&carrier=APLU&carrier=MSCU&departureFrom=2018-09-17T00:00:00.000Z&departureTo=2018-09-30T23:59:59.999Z&fndID=P1015&isOriginal=true&porID=P94&requestRefNo=432d9035-b7bb-40d9-b03f-208ffcbdafa3&socketID=yywiluhT_bdXDglEAAkc **401 (Unauthorized)**
(anonymous) @ angular.js:12759
q @ angular.js:12492
(anonymous) @ angular.js:12244
(anonymous) @ angular.js:17051
$digest @ angular.js:18233
(anonymous) @ angular.js:18462
e @ angular.js:6362
(anonymous) @ angular.js:6642
setTimeout (async)
h.defer @ angular.js:6640
$evalAsync @ angular.js:18460
(anonymous) @ angular.js:16923
k @ angular.js:17095
l @ angular.js:17122
c @ angular.js:17131
r @ bluebird.min.js:31
i._settlePromiseFromHandler @ bluebird.min.js:30
i._settlePromise @ bluebird.min.js:30
i._settlePromise0 @ bluebird.min.js:30
i._settlePromises @ bluebird.min.js:30
r._drainQueue @ bluebird.min.js:29
r._drainQueues @ bluebird.min.js:29
drainQueues @ bluebird.min.js:29
Promise.then (async)
r @ bluebird.min.js:30
r._queueTick @ bluebird.min.js:29
s @ bluebird.min.js:29
p.hasDevTools.r.settlePromises @ bluebird.min.js:29
i._fulfill @ bluebird.min.js:30
i._resolveCallback @ bluebird.min.js:30
(anonymous) @ bluebird.min.js:30
Do @ recaptcha__en.js:251
(anonymous) @ recaptcha__en.js:249
T4 @ recaptcha__en.js:71
ta @ recaptcha__en.js:71
Y @ recaptcha__en.js:68
application-c962374717.min.js:23
Uncaught TypeError: **Cannot assign to read only property** 'tagName' of object '[object HTMLAnchorElement]'
at Object.handler.tagNameHandler (application-c962374717.min.js:23)
at Object.handler.constructInfo (application-c962374717.min.js:23)
at tracking (application-c962374717.min.js:23)
at firstThingAfterSearch (application-c962374717.min.js:23)
at monitor (application-c962374717.min.js:23)
at application-c962374717.min.js:23
handler.tagNameHandler @ application-c962374717.min.js:23
handler.constructInfo @ application-c962374717.min.js:23
tracking @ application-c962374717.min.js:23
firstThingAfterSearch @ application-c962374717.min.js:23
monitor @ application-c962374717.min.js:23
(anonymous) @ application-c962374717.min.js:23
setTimeout (async)
(anonymous) @ application-c962374717.min.js:23
wrappedFn @ application-c962374717.min.js:23
angular.js:12759
GET https://www.bigschedules.com/api/routeSearch/query?_=1537193947261&carrier=COSU&carrier=APLU&carrier=MSCU&departureFrom=2018-09-17T00:00:00.000Z&departureTo=2018-09-30T23:59:59.999Z&fndID=P156&isOriginal=true&porID=P94&requestRefNo=ba8fbb09-d98a-4b44-96e0-040511775c80&socketID=yywiluhT_bdXDglEAAkc **401 (Unauthorized)**
答案 0 :(得分:0)
您可以在Pyhton上尝试 urllib2 和 BeautifulSoup 。 下面的代码示例向您展示如何从页面源获取页面元素的属性。
from BeautifulSoup import BeautifulSoup as BeautifulSoup
import urllib2
page = urllib2.urlopen('yourUrl')
soup = BeautifulSoup(page)
elementsYouWantToExtract = soup.findAll('element tag for instance: "img" ')
for attributeYouWantToExtract in elementsYouWantToExtract:
print elementsYouWantToSearch['attributeYouWantToExtract']
希望这会有所帮助...
答案 1 :(得分:0)
根据您的代码试用,您是否已调用 url https://www.bigschedules.com/tou
并不明显。但是根据您的错误堆栈跟踪,您的主要问题是:
WebSocket connection to 'wss://www.bigschedules.com/socket.io/?EIO=3&transport=websocket&sid=yywiluhT_bdXDglEAAkc' failed: Error during WebSocket handshake: Unexpected response code: 200
错误背后可能有很多原因,如下所示:
对于连接速度较慢的用户(例如2G),通过document.write加载的第三方脚本的性能损失通常非常严重,以至于导致主页内容的显示延迟了数十秒钟。如果2G连接上的用户未命中HTTP缓存,此功能将阻止通过document.write插入的跨域,解析器阻止脚本的加载。该功能仅适用于主机中的此类脚本。
另一个原因可能是,加载时已检测到缓慢的网络并使用了 Fallback字体,该字体已配置为不进行消毒策略,并且对安全性有严重影响。因此,您面对:
response code: 200