同时获取两个不同的html节点

时间:2018-01-10 03:24:40

标签: html nodes rvest

我正试图抓一个网页。

我想从两个不同的html节点获取数据集; “.table-grosse-schrift”和“td.zentriert.no-border”。

Jan 10, 2018 6:07:55 AM okhttp3.internal.platform.Platform log
INFO: --> POST https://gateway-a.watsonplatform.net/visual-recognition/api/v3/classify?version=2016-05-20&api_key=aca4433597018de62edafdeebceb2bdc1482496a http/1.1 (-1-byte body)
Jan 10, 2018 6:08:06 AM okhttp3.internal.platform.Platform log
INFO: <-- 400 Bad Request https://gateway-a.watsonplatform.net/visual-recognition/api/v3/classify?version=2016-05-20&api_key=aca4433597018de62edafdeebceb2bdc1482496a (10214ms, 167-byte body)
Jan 10, 2018 6:08:06 AM com.ibm.watson.developer_cloud.service.WatsonService processServiceCall
SEVERE: POST https://gateway-a.watsonplatform.net/visual-recognition/api/v3/classify?version=2016-05-20&api_key=aca4433597018de62edafdeebceb2bdc1482496a, status: 400, error: {
    "images_processed": 0,
    "error": {
        "code": 400,
        "description": "Invalid form data 'parameters'",
        "error_id": "parameter_error"
    }
}
Exception in thread "AWT-EventQueue-0" com.ibm.watson.developer_cloud.service.exception.BadRequestException: {
    "images_processed": 0,
    "error": {
        "code": 400,
        "description": "Invalid form data 'parameters'",
        "error_id": "parameter_error"
    }
}
    at com.ibm.watson.developer_cloud.service.WatsonService.processServiceCall(WatsonService.java:408)
    at com.ibm.watson.developer_cloud.service.WatsonService$1.execute(WatsonService.java:174)
    at visualRecognitionSecondTry.ClassifyInterface$3.actionPerformed(ClassifyInterface.java:129)
    at javax.swing.AbstractButton.fireActionPerformed(Unknown Source)
    at javax.swing.AbstractButton$Handler.actionPerformed(Unknown Source)
    at javax.swing.DefaultButtonModel.fireActionPerformed(Unknown Source)
    at javax.swing.DefaultButtonModel.setPressed(Unknown Source)
    at javax.swing.plaf.basic.BasicButtonListener.mouseReleased(Unknown Source)
    at java.awt.Component.processMouseEvent(Unknown Source)
    at javax.swing.JComponent.processMouseEvent(Unknown Source)
    at java.awt.Component.processEvent(Unknown Source)
    at java.awt.Container.processEvent(Unknown Source)
    at java.awt.Component.dispatchEventImpl(Unknown Source)
    at java.awt.Container.dispatchEventImpl(Unknown Source)
    at java.awt.Component.dispatchEvent(Unknown Source)
    at java.awt.LightweightDispatcher.retargetMouseEvent(Unknown Source)
    at java.awt.LightweightDispatcher.processMouseEvent(Unknown Source)
    at java.awt.LightweightDispatcher.dispatchEvent(Unknown Source)
    at java.awt.Container.dispatchEventImpl(Unknown Source)
    at java.awt.Window.dispatchEventImpl(Unknown Source)
    at java.awt.Component.dispatchEvent(Unknown Source)
    at java.awt.EventQueue.dispatchEventImpl(Unknown Source)
    at java.awt.EventQueue.access$500(Unknown Source)
    at java.awt.EventQueue$3.run(Unknown Source)
    at java.awt.EventQueue$3.run(Unknown Source)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(Unknown Source)
    at java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(Unknown Source)
    at java.awt.EventQueue$4.run(Unknown Source)
    at java.awt.EventQueue$4.run(Unknown Source)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(Unknown Source)
    at java.awt.EventQueue.dispatchEvent(Unknown Source)
    at java.awt.EventDispatchThread.pumpOneEventForFilters(Unknown Source)
    at java.awt.EventDispatchThread.pumpEventsForFilter(Unknown Source)
    at java.awt.EventDispatchThread.pumpEventsForHierarchy(Unknown Source)
    at java.awt.EventDispatchThread.pumpEvents(Unknown Source)
    at java.awt.EventDispatchThread.pumpEvents(Unknown Source)
    at java.awt.EventDispatchThread.run(Unknown Source)

问题是网页上“.table-grosse-schrift”节点的顺序不断变化,因此我无法匹配来自两个节点的数据。

我发现解决方案可以同时获取两个节点的数据,如下所示:

url<-paste0("https://www.transfermarkt.co.uk/serie-a/spieltag/wettbewerb/IT1/saison_id/2016/spieltag/4")

tt<-read_html(url[[x]]) %>%html_nodes(".table-grosse-schrift")%>%html_text()%>%as.matrix()
  temp1=data.frame(as.character(gsub("\r|\n|\t|\U00A0", "", tt[,])))
  temp2<-(read_html(url[[x]]) %>%html_nodes("td.zentriert.no-border") %>%  html_text() %>% data.frame())

但是这段代码不起作用。

1 个答案:

答案 0 :(得分:0)

如果我理解正确,您应该可以使用following-sibling在您需要的节点对中选择 next 相应的兄弟。

  

以下 - 兄弟轴表示具有相同的所有节点   parent作为上下文节点并出现在上下文节点之后   来源文件。 (来源:https://developer.mozilla.org/en-US/docs/Web/XPath/Axes/following-sibling