我正试图抓一个网页。
我想从两个不同的html节点获取数据集; “.table-grosse-schrift”和“td.zentriert.no-border”。
Jan 10, 2018 6:07:55 AM okhttp3.internal.platform.Platform log
INFO: --> POST https://gateway-a.watsonplatform.net/visual-recognition/api/v3/classify?version=2016-05-20&api_key=aca4433597018de62edafdeebceb2bdc1482496a http/1.1 (-1-byte body)
Jan 10, 2018 6:08:06 AM okhttp3.internal.platform.Platform log
INFO: <-- 400 Bad Request https://gateway-a.watsonplatform.net/visual-recognition/api/v3/classify?version=2016-05-20&api_key=aca4433597018de62edafdeebceb2bdc1482496a (10214ms, 167-byte body)
Jan 10, 2018 6:08:06 AM com.ibm.watson.developer_cloud.service.WatsonService processServiceCall
SEVERE: POST https://gateway-a.watsonplatform.net/visual-recognition/api/v3/classify?version=2016-05-20&api_key=aca4433597018de62edafdeebceb2bdc1482496a, status: 400, error: {
"images_processed": 0,
"error": {
"code": 400,
"description": "Invalid form data 'parameters'",
"error_id": "parameter_error"
}
}
Exception in thread "AWT-EventQueue-0" com.ibm.watson.developer_cloud.service.exception.BadRequestException: {
"images_processed": 0,
"error": {
"code": 400,
"description": "Invalid form data 'parameters'",
"error_id": "parameter_error"
}
}
at com.ibm.watson.developer_cloud.service.WatsonService.processServiceCall(WatsonService.java:408)
at com.ibm.watson.developer_cloud.service.WatsonService$1.execute(WatsonService.java:174)
at visualRecognitionSecondTry.ClassifyInterface$3.actionPerformed(ClassifyInterface.java:129)
at javax.swing.AbstractButton.fireActionPerformed(Unknown Source)
at javax.swing.AbstractButton$Handler.actionPerformed(Unknown Source)
at javax.swing.DefaultButtonModel.fireActionPerformed(Unknown Source)
at javax.swing.DefaultButtonModel.setPressed(Unknown Source)
at javax.swing.plaf.basic.BasicButtonListener.mouseReleased(Unknown Source)
at java.awt.Component.processMouseEvent(Unknown Source)
at javax.swing.JComponent.processMouseEvent(Unknown Source)
at java.awt.Component.processEvent(Unknown Source)
at java.awt.Container.processEvent(Unknown Source)
at java.awt.Component.dispatchEventImpl(Unknown Source)
at java.awt.Container.dispatchEventImpl(Unknown Source)
at java.awt.Component.dispatchEvent(Unknown Source)
at java.awt.LightweightDispatcher.retargetMouseEvent(Unknown Source)
at java.awt.LightweightDispatcher.processMouseEvent(Unknown Source)
at java.awt.LightweightDispatcher.dispatchEvent(Unknown Source)
at java.awt.Container.dispatchEventImpl(Unknown Source)
at java.awt.Window.dispatchEventImpl(Unknown Source)
at java.awt.Component.dispatchEvent(Unknown Source)
at java.awt.EventQueue.dispatchEventImpl(Unknown Source)
at java.awt.EventQueue.access$500(Unknown Source)
at java.awt.EventQueue$3.run(Unknown Source)
at java.awt.EventQueue$3.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(Unknown Source)
at java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(Unknown Source)
at java.awt.EventQueue$4.run(Unknown Source)
at java.awt.EventQueue$4.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(Unknown Source)
at java.awt.EventQueue.dispatchEvent(Unknown Source)
at java.awt.EventDispatchThread.pumpOneEventForFilters(Unknown Source)
at java.awt.EventDispatchThread.pumpEventsForFilter(Unknown Source)
at java.awt.EventDispatchThread.pumpEventsForHierarchy(Unknown Source)
at java.awt.EventDispatchThread.pumpEvents(Unknown Source)
at java.awt.EventDispatchThread.pumpEvents(Unknown Source)
at java.awt.EventDispatchThread.run(Unknown Source)
问题是网页上“.table-grosse-schrift”节点的顺序不断变化,因此我无法匹配来自两个节点的数据。
我发现解决方案可以同时获取两个节点的数据,如下所示:
url<-paste0("https://www.transfermarkt.co.uk/serie-a/spieltag/wettbewerb/IT1/saison_id/2016/spieltag/4")
tt<-read_html(url[[x]]) %>%html_nodes(".table-grosse-schrift")%>%html_text()%>%as.matrix()
temp1=data.frame(as.character(gsub("\r|\n|\t|\U00A0", "", tt[,])))
temp2<-(read_html(url[[x]]) %>%html_nodes("td.zentriert.no-border") %>% html_text() %>% data.frame())
但是这段代码不起作用。
答案 0 :(得分:0)
如果我理解正确,您应该可以使用following-sibling
在您需要的节点对中选择 next 相应的兄弟。
以下 - 兄弟轴表示具有相同的所有节点 parent作为上下文节点并出现在上下文节点之后 来源文件。 (来源:https://developer.mozilla.org/en-US/docs/Web/XPath/Axes/following-sibling)