使用R将XML请求表单提交给API

时间:2016-10-04 16:18:08

标签: r xml web-scraping rcurl rvest

我以前(并且仍然)从开头link导航时遇到问题,以便到达下一个链接以填写数据请求表单,然后从结果表中提取信息。我的编码尝试是here

我被告知CDC Wonder确实有一个API,以便在R中提交XML请求表单。有关如何提交表单的所有详细信息的链接是here

但是,我不知道如何使用R发送XML请求表单并尝试搜索解决方案。如果有人可以根据API的说明和他们列出的XML表单示例之一让我开始,那么我想我可以弄清楚其余部分。

下面的代码显示了我尝试使用第一个示例XML请求表单:

request_xml <-  
"<?xml version="1.0" encoding="UTF-8"?>
 <request-parameters>
 <parameter>
  <name>accept_datause_restrictions</name>
  <value>true</value>
 </parameter>
 <parameter>
  <name>B_1</name>
  <value>D76.V1-level1</value>
 </parameter>
 <parameter>
  <name>B_2</name>
  <value>D76.V8</value>
 </parameter>
 <parameter>
  <name>B_3</name>
  <value>*None*</value>
 </parameter>
 <parameter>
  <name>B_4</name>
  <value>*None*</value>
 </parameter>
 <parameter>
  <name>B_5</name>
  <value>*None*</value>
 </parameter>
 <parameter>
  <name>F_D76.V1</name>
  <value>2009</value>
  <value>2010</value>
  <value>2011</value>
  <value>2012</value>
  <value>2013</value>
 </parameter>
 <parameter>
  <name>F_D76.V10</name>
  <value>*All*</value>
 </parameter>
 <parameter>
  <name>F_D76.V2</name>
  <value>C00-D48</value>
 </parameter>
 <parameter>
  <name>F_D76.V27</name>
  <value>*All*</value>
 </parameter>
 <parameter>
  <name>F_D76.V9</name>
  <value>*All*</value>
 </parameter>
 <parameter>
  <name>I_D76.V1</name>
  <value>
   2009 (2009) 2010 (2010) 2011 (2011) 2012 (2012) 2013 (2013)
  </value>
 </parameter>
 <parameter>
  <name>I_D76.V10</name>
  <value>*All* (The United States)</value>
 </parameter>
 <parameter>
  <name>I_D76.V2</name>
  <value>C00-D48 (Neoplasms)</value>
 </parameter>
 <parameter>
  <name>I_D76.V27</name>
  <value>*All* (The United States)</value>
 </parameter>
 <parameter>
  <name>I_D76.V9</name>
  <value>*All* (The United States)</value>
 </parameter>
 <parameter>
  <name>M_1</name>
  <value>D76.M1</value>
 </parameter>
 <parameter>
  <name>M_2</name>
  <value>D76.M2</value>
 </parameter>
 <parameter>
  <name>M_3</name>
  <value>D76.M3</value>
 </parameter>
 <parameter>
  <name>M_41</name>
  <value>D76.M41</value>
 </parameter>
 <parameter>
  <name>M_42</name>
  <value>D76.M42</value>
 </parameter>
 <parameter>
  <name>O_V10_fmode</name>
  <value>freg</value>
 </parameter>
 <parameter>
  <name>O_V1_fmode</name>
  <value>freg</value>
 </parameter>
 <parameter>
  <name>O_V27_fmode</name>
  <value>freg</value>
 </parameter>
 <parameter>
  <name>O_V2_fmode</name>
  <value>freg</value>
 </parameter>
 <parameter>
  <name>O_V9_fmode</name>
  <value>freg</value>
 </parameter>
 <parameter>
  <name>O_aar</name>
  <value>aar_std</value>
 </parameter>
 <parameter>
  <name>O_aar_pop</name>
  <value>0000</value>
 </parameter>
 <parameter>
  <name>O_age</name>
  <value>D76.V5</value>
 </parameter>
 <parameter>
  <name>O_javascript</name>
  <value>on</value>
 </parameter>
 <parameter>
  <name>O_location</name>
  <value>D76.V9</value>
 </parameter>
 <parameter>
  <name>O_precision</name>
  <value>1</value>
 </parameter>
 <parameter>
  <name>O_rate_per</name>
  <value>100000</value>
 </parameter>
 <parameter>
  <name>O_show_totals</name>
  <value>true</value>
 </parameter>
 <parameter>
  <name>O_timeout</name>
  <value>300</value>
 </parameter>
 <parameter>
  <name>O_title</name>
  <value>Example1</value>
 </parameter>
 <parameter>
  <name>O_ucd</name>
  <value>D76.V2</value>
 </parameter>
 <parameter>
  <name>O_urban</name>
  <value>D76.V19</value>
 </parameter>
 <parameter>
  <name>VM_D76.M6_D76.V10</name>
  <value/>
 </parameter>
 <parameter>
  <name>VM_D76.M6_D76.V17</name>
  <value>*All*</value>
 </parameter>
 <parameter>
  <name>VM_D76.M6_D76.V1_S</name>
  <value>*All*</value>
 </parameter>
 <parameter>
  <name>VM_D76.M6_D76.V7</name>
  <value>*All*</value>
 </parameter>
 <parameter>
  <name>VM_D76.M6_D76.V8</name>
  <value>*All*</value>
 </parameter>
 <parameter>
  <name>V_D76.V1</name>
  <value/>
 </parameter>
 <parameter>
  <name>V_D76.V10</name>
  <value/>
 </parameter>
 <parameter>
  <name>V_D76.V11</name>
  <value>*All*</value>
 </parameter>
 <parameter>
  <name>V_D76.V12</name>
  <value>*All*</value>
 </parameter>
 <parameter>
  <name>V_D76.V17</name>
  <value>*All*</value>
 </parameter>
 <parameter>
  <name>V_D76.V19</name>
  <value>*All*</value>
 </parameter>
 <parameter>
  <name>V_D76.V2</name>
  <value/>
 </parameter>
 <parameter>
  <name>V_D76.V20</name>
  <value>*All*</value>
 </parameter>
 <parameter>
  <name>V_D76.V21</name>
  <value>*All*</value>
 </parameter>
 <parameter>
  <name>V_D76.V22</name>
  <value>*All*</value>
 </parameter>
 <parameter>
  <name>V_D76.V23</name>
  <value>*All*</value>
 </parameter>
 <parameter>
  <name>V_D76.V24</name>
  <value>*All*</value>
 </parameter>
 <parameter>
  <name>V_D76.V25</name>
  <value>*All*</value>
 </parameter>
 <parameter>
  <name>V_D76.V27</name>
  <value/>
 </parameter>
 <parameter>
  <name>V_D76.V4</name>
  <value>*All*</value>
 </parameter>
 <parameter>
  <name>V_D76.V5</name>
  <value>1</value>
  <value>1-4</value>
  <value>5-14</value>
  <value>15-24</value>
  <value>25-34</value>
  <value>35-44</value>
  <value>45-54</value>
  <value>55-64</value>
  <value>65-74</value>
  <value>75-84</value>
  <value>85+</value>
 </parameter>
 <parameter>
  <name>V_D76.V51</name>
  <value>*All*</value>
 </parameter>
 <parameter>
  <name>V_D76.V52</name>
  <value>*All*</value>
 </parameter>
 <parameter>
  <name>V_D76.V6</name>
  <value>00</value>
 </parameter>
 <parameter>
  <name>V_D76.V7</name>
  <value>*All*</value>
 </parameter>
 <parameter>
  <name>V_D76.V8</name>
  <value>*All*</value>
 </parameter>
 <parameter>
  <name>V_D76.V9</name>
  <value/>
 </parameter>
 <parameter>
  <name>action-Send</name>
  <value>Send</value>
 </parameter>
 <parameter>
  <name>finder-stage-D76.V1</name>
  <value>codeset</value>
 </parameter>
 <parameter>
  <name>finder-stage-D76.V10</name>
  <value>codeset</value>
 </parameter>
 <parameter>
  <name>finder-stage-D76.V2</name>
  <value>codeset</value>
 </parameter>
 <parameter>
  <name>finder-stage-D76.V27</name>
  <value>codeset</value>
 </parameter>
 <parameter>
  <name>finder-stage-D76.V9</name>
  <value>codeset</value>
 </parameter>
 <parameter>
  <name>stage</name>
  <value>request</value>
 </parameter>
</request-parameters>"  

library(RCurl)
url <- "http://wonder.cdc.gov/controller/datarequest/D76"
data <- getURL(
 url = url,
 postfields = request_xml,
 verbose = TRUE
)

感谢您的时间!

ACE

1 个答案:

答案 0 :(得分:3)

尽管有些人宣称CDC WONDER API非常棒,但对我来说这似乎是一个非常糟糕的API接口。但是,由于针对SQL的过多攻击方法,我可以看出为什么他们可能会因为暴露SQL界面(这是真正需要的)而犹豫不决。

实际上,必须手工制作XML才能进行这些查询,这简直太可怕了。因此,您应该使用wondr,一个与API一起使用的R包。

它仍然很难看(这是他们的一个示例查询):

library(wondr) ## devtools::install_github("hrbrmstr/wondr")

wondr() %>%
  add_param("B_1", "D76.V22") %>% 
  add_param("B_2", "D76.V23") %>% 
  add_param("B_3", "*None*") %>% 
  add_param("B_4", "*None*") %>% 
  add_param("B_5", "*None*") %>% 
  add_param("F_D76.V1", "*All*") %>% 
  add_param("F_D76.V10", "*All*") %>% 
  add_param("F_D76.V2", "*All*") %>% 
  add_param("F_D76.V27", "*All*") %>% 
  add_param("F_D76.V9", "*All*") %>% 
  add_param("I_D76.V1", "*All* (All Dates)") %>% 
  add_param("I_D76.V10", "*All* (The United States)") %>% 
  add_param("I_D76.V2", "*All* (All Causes of Death)") %>% 
  add_param("I_D76.V27", "*All* (The United States)") %>% 
  add_param("I_D76.V9", "*All* (The United States)") %>% 
  add_param("M_1", "D76.M1") %>% 
  add_param("M_2", "D76.M2") %>% 
  add_param("M_3", "D76.M3") %>% 
  add_param("O_V10_fmode", "freg") %>% 
  add_param("O_V1_fmode", "freg") %>% 
  add_param("O_V27_fmode", "freg") %>% 
  add_param("O_V2_fmode", "freg") %>% 
  add_param("O_V9_fmode", "freg") %>% 
  add_param("O_aar", "aar_none") %>% 
  add_param("O_aar_pop", "0000") %>% 
  add_param("O_age", "D76.V52") %>% 
  add_param("O_javascript", "on") %>% 
  add_param("O_location", "D76.V9") %>% 
  add_param("O_precision", "1") %>% 
  add_param("O_rate_per", "100000") %>% 
  add_param("O_show_totals", "true") %>% 
  add_param("O_timeout", "300") %>% 
  add_param("O_title", "Example2") %>% 
  add_param("O_ucd", "D76.V22") %>% 
  add_param("O_urban", "D76.V19") %>% 
  add_param("VM_D76.M6_D76.V10", "") %>% 
  add_param("VM_D76.M6_D76.V17", "*All*") %>% 
  add_param("VM_D76.M6_D76.V1_S", "*All*") %>% 
  add_param("VM_D76.M6_D76.V7", "*All*") %>% 
  add_param("VM_D76.M6_D76.V8", "*All*") %>% 
  add_param("V_D76.V1", "") %>% 
  add_param("V_D76.V10", "") %>% 
  add_param("V_D76.V11", "*All*") %>% 
  add_param("V_D76.V12", "*All*") %>% 
  add_param("V_D76.V17", "*All*") %>% 
  add_param("V_D76.V19", "*All*") %>% 
  add_param("V_D76.V2", "") %>% 
  add_param("V_D76.V20", "*All*") %>% 
  add_param("V_D76.V21", "*All*") %>% 
  add_param("V_D76.V22", "1", "2", "3", "4", "5") %>% 
  add_param("V_D76.V23", "*All*") %>% 
  add_param("V_D76.V24", "*All*") %>% 
  add_param("V_D76.V25", "*All*") %>% 
  add_param("V_D76.V27", "") %>% 
  add_param("V_D76.V4", "*All*") %>% 
  add_param("V_D76.V5", "*All*") %>% 
  add_param("V_D76.V51", "*All*") %>% 
  add_param("V_D76.V52", "0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18") %>% 
  add_param("V_D76.V6", "00") %>% 
  add_param("V_D76.V7", "*All*") %>% 
  add_param("V_D76.V8", "*All*") %>% 
  add_param("V_D76.V9", "") %>% 
  add_param("action-Send", "Send") %>% 
  add_param("finder-stage-D76.V1", "codeset") %>% 
  add_param("finder-stage-D76.V10", "codeset") %>% 
  add_param("finder-stage-D76.V2", "codeset") %>% 
  add_param("finder-stage-D76.V27", "codeset") %>% 
  add_param("finder-stage-D76.V9", "codeset") %>% 
  add_param("stage", "request") %>%  
  make_query("D76") -> query_result

您需要解析结果:

library(xml2)
library(purrr)

xml_find_all(query_result, ".//response/data-table/r") %>% 
  map_df(function(row) {
    xml_find_all(row, ".//c") %>% 
      xml_attrs() %>% 
      as.list() %>% 
      setNames(sprintf("V%d", 1:length(.))) %>% 
      as.data.frame(stringsAsFactors=FALSE)
  }) -> df

print(df)
##                                        V1                                               V2            V3            V4         V5
## 1                           Unintentional                                       Cut/Pierce            97 1,243,249,173        0.0
## 2                           Unintentional                                         Drowning        15,945 1,243,249,173        1.3
## 3                           Unintentional                                             Fall         2,213 1,243,249,173        0.2
## 4                           Unintentional                                       Fire/Flame         7,301 1,243,249,173        0.6
## 5                           Unintentional                             Hot object/Substance            96 1,243,249,173        0.0
## 6                           Unintentional                                          Firearm         2,042 1,243,249,173        0.2
## 7                           Unintentional                                        Machinery           484 1,243,249,173        0.0
## 8                           Unintentional                            Motor Vehicle Traffic        74,997 1,243,249,173        6.0
## 9                           Unintentional                              Other Pedal cyclist           458 1,243,249,173        0.0
## 10                          Unintentional                                 Other Pedestrian         3,221 1,243,249,173        0.3
## 11                          Unintentional                             Other land transport         3,449 1,243,249,173        0.3
## 12                          Unintentional                                  Other transport         1,344 1,243,249,173        0.1
## 13                          Unintentional                            Natural/Environmental         1,681 1,243,249,173        0.1
## 14                          Unintentional                                     Overexertion             2 1,243,249,173 Unreliable
## 15                          Unintentional                                        Poisoning         7,326 1,243,249,173        0.6
## 16                          Unintentional                             Struck by or against         1,378 1,243,249,173        0.1
## 17                          Unintentional                                      Suffocation        17,356 1,243,249,173        1.4
## 18                          Unintentional             Other specified, classifiable Injury         1,199 1,243,249,173        0.1
## 19                          Unintentional Other specified, not elsewhere classified Injury           518 1,243,249,173        0.0
## 20                          Unintentional                               Unspecified Injury         1,540 1,243,249,173        0.1
## 21                          Unintentional                                                1       142,647 1,243,249,173       11.5
## 22                                Suicide                                       Cut/Pierce            64 1,243,249,173        0.0
## 23                                Suicide                                         Drowning           111 1,243,249,173        0.0
## 24                                Suicide                                             Fall           412 1,243,249,173        0.0
## 25                                Suicide                                       Fire/Flame            50 1,243,249,173        0.0
## 26                                Suicide                                          Firearm         9,956 1,243,249,173        0.8
## 27                                Suicide                             Other land transport           139 1,243,249,173        0.0
## 28                                Suicide                                        Poisoning         1,336 1,243,249,173        0.1
## 29                                Suicide                                      Suffocation        10,559 1,243,249,173        0.8
## 30                                Suicide             Other specified, classifiable Injury           379 1,243,249,173        0.0
## 31                                Suicide Other specified, not elsewhere classified Injury           103 1,243,249,173        0.0
## 32                                Suicide                               Unspecified Injury            76 1,243,249,173        0.0
## 33                                Suicide                                                1        23,185 1,243,249,173        1.9
## 34                               Homicide                                       Cut/Pierce         2,375 1,243,249,173        0.2
## 35                               Homicide                                         Drowning           459 1,243,249,173        0.0
## 36                               Homicide                                             Fall            33 1,243,249,173        0.0
## 37                               Homicide                                       Fire/Flame           561 1,243,249,173        0.0
## 38                               Homicide                             Hot object/Substance            47 1,243,249,173        0.0
## 39                               Homicide                                          Firearm        20,897 1,243,249,173        1.7
## 40                               Homicide                             Other land transport           131 1,243,249,173        0.0
## 41                               Homicide                                  Other transport             8 1,243,249,173 Unreliable
## 42                               Homicide                                        Poisoning           494 1,243,249,173        0.0
## 43                               Homicide                             Struck by or against           338 1,243,249,173        0.0
## 44                               Homicide                                      Suffocation         1,582 1,243,249,173        0.1
## 45                               Homicide             Other specified, classifiable Injury         2,912 1,243,249,173        0.2
## 46                               Homicide Other specified, not elsewhere classified Injury         1,137 1,243,249,173        0.1
## 47                               Homicide                               Unspecified Injury         5,821 1,243,249,173        0.5
## 48                               Homicide                                                1        36,795 1,243,249,173        3.0
## 49                          Undetermined                                        Cut/Pierce            10 1,243,249,173 Unreliable
## 50                          Undetermined                                          Drowning           377 1,243,249,173        0.0
## 51                          Undetermined                                              Fall            96 1,243,249,173        0.0
## 52                          Undetermined                                        Fire/Flame           256 1,243,249,173        0.0
## 53                          Undetermined                              Hot object/Substance             4 1,243,249,173 Unreliable
## 54                          Undetermined                                           Firearm           501 1,243,249,173        0.0
## 55                          Undetermined                              Other land transport            33 1,243,249,173        0.0
## 56                          Undetermined                                         Poisoning         1,217 1,243,249,173        0.1
## 57                          Undetermined                              Struck by or against             2 1,243,249,173 Unreliable
## 58                          Undetermined                                       Suffocation         1,234 1,243,249,173        0.1
## 59                          Undetermined              Other specified, classifiable Injury            23 1,243,249,173        0.0
## 60                          Undetermined  Other specified, not elsewhere classified Injury           267 1,243,249,173        0.0
## 61                          Undetermined                                Unspecified Injury           743 1,243,249,173        0.1
## 62                          Undetermined                                                 1         4,763 1,243,249,173        0.4
## 63 Legal Intervention / Operations of War                                          Firearm           251 1,243,249,173        0.0
## 64 Legal Intervention / Operations of War             Other specified, classifiable Injury             1 1,243,249,173 Unreliable
## 65 Legal Intervention / Operations of War Other specified, not elsewhere classified Injury            10 1,243,249,173 Unreliable
## 66 Legal Intervention / Operations of War                               Unspecified Injury             2 1,243,249,173 Unreliable
## 67 Legal Intervention / Operations of War                                                1           264 1,243,249,173        0.0
## 68                                      2                                          207,654 1,243,249,173          16.7       <NA>

而且,你仍然需要清理它,但是你现在至少有办法制作,执行和解析查询。

可能值得请求rOpenSci人员采用这个pkg(我不记得在他们庞大的图书馆看到它)。我也会在几周内打它。