Question

http://api.bart.gov/api/stn.aspx?cmd=stns&key=MW9S-E7SL-26DU-VV8V

如何将其作为XML文档移植？我试图在R中解析它。

Answer 1

您可以使用xml2来读取和解析：

library(xml2)
library(tidyverse)

xml <- read_xml('https://api.bart.gov/api/stn.aspx?cmd=stns&key=MW9S-E7SL-26DU-VV8V')

bart <- xml %>% xml_find_all('//station') %>%    # select all station nodes
    map_df(as_list) %>%    # coerce each node to list, collect to data.frame
    unnest()    # unnest list columns of data.frame

bart
#> # A tibble: 46 × 9
#>                            name  abbr gtfs_latitude gtfs_longitude
#>                           <chr> <chr>         <chr>          <chr>
#> 1  12th St. Oakland City Center  12TH     37.803768    -122.271450
#> 2              16th St. Mission  16TH     37.765062    -122.419694
#> 3              19th St. Oakland  19TH     37.808350    -122.268602
#> 4              24th St. Mission  24TH     37.752470    -122.418143
#> 5                         Ashby  ASHB     37.852803    -122.270062
#> 6                   Balboa Park  BALB     37.721585    -122.447506
#> 7                      Bay Fair  BAYF     37.696924    -122.126514
#> 8                 Castro Valley  CAST     37.690746    -122.075602
#> 9         Civic Center/UN Plaza  CIVC     37.779732    -122.414123
#> 10                     Coliseum  COLS     37.753661    -122.196869
#> # ... with 36 more rows, and 5 more variables: address <chr>, city <chr>,
#> #   county <chr>, state <chr>, zipcode <chr>

Answer 2

使用库rvest。基本思想是使用XPath选择器查找感兴趣的节点（xml_nodes），然后使用xml_text

获取值

library(rvest)

doc <- read_xml("http://api.bart.gov/api/stn.aspx?cmd=stns&key=MW9S-E7SL-26DU-VV8V")
names <- doc %>% 
  xml_nodes(xpath = "/root/stations/station/name") %>%
  xml_text()

names[1:5]

# [1] "12th St. Oakland City Center" "16th St. Mission"             "19th St. Oakland"             "24th St. Mission"            
# [5] "Ashby"

Answer 3

我在read_html内直接使用网址时遇到了一些问题。所以我先用readLines。之后，它找到了<station>的所有节点集。将其转换为列表并将其提供给data.table::rbindlist。使用rbindlist的想法来自here

library(xml2)
library(data.table)
nodesets <- read_html(readLines("http://api.bart.gov/api/stn.aspx?cmd=stns&key=MW9S-E7SL-26DU-VV8V")) %>% 
    xml_find_all(".//station")
data.table::rbindlist(as_list(nodesets))

如何将非xml解析为xml？

3 个答案: