使用R从API中刮取数据

时间:2017-08-13 00:37:58

标签: r authentication curl

我很擅长使用R来通过API抓取数据。

我尝试使用R连接到马来西亚气象部门API,我从下面提供的参考资料中获得了卷曲脚本。

curl -H "Authorization: METToken MYTOKENID" "http://api.met.gov.my/v2/data?datasetid=FORECAST&datacategoryid=GENERAL&locationid=LOCATION:237&start_date=2017-08-13&end_date=2017-08-13"

如何使用R来提取某些数据呢?我在注册时向我提供了API令牌。

谢谢!

1 个答案:

答案 0 :(得分:2)

如果您使用的是API,则不是网页抓取。

在执行任何其他操作之前,首先,将您的令牌存储在MYWX_TOKEN ~/.Renviron中,然后重新启动R / RStudio。

然后,做:

devtools::install_github("hrbrmstr/mywx")

这将使您能够执行以下操作:

library(mywx)
library(tidyverse)

mywx_districts()
## # A tibble: 51 x 6
##             id        name locationcategoryid locationrootid latitude longitude
##  *       <chr>       <chr>              <chr>          <chr>    <dbl>     <dbl>
##  1 LOCATION:17  BATU PAHAT           DISTRICT     LOCATION:1 1.854800  102.9325
##  2 LOCATION:18 JOHOR BAHRU           DISTRICT     LOCATION:1 1.465500  103.7578
##  3 LOCATION:19      KLUANG           DISTRICT     LOCATION:1 2.025100  103.3328
##  4 LOCATION:20 KOTA TINGGI           DISTRICT     LOCATION:1 1.738100  103.8999
##  5 LOCATION:21      LEDANG           DISTRICT     LOCATION:1 2.262401  102.6498
##  6 LOCATION:22     MERSING           DISTRICT     LOCATION:1 2.431200  103.8405
##  7 LOCATION:23        MUAR           DISTRICT     LOCATION:1 2.044200  102.5689
##  8 LOCATION:24    NUSAJAYA           DISTRICT     LOCATION:1 1.413590  103.6317
##  9 LOCATION:25     SEGAMAT           DISTRICT     LOCATION:1 2.514800  102.8158
## 10 LOCATION:26     PONTIAN           DISTRICT     LOCATION:1 1.516380  103.3839
## # ... with 41 more rows

mywx_states()
## # A tibble: 16 x 5
##             id            name locationcategoryid latitude longitude
##  *       <chr>           <chr>              <chr>    <dbl>     <dbl>
##  1  LOCATION:1           JOHOR              STATE 1.465500  103.7578
##  2  LOCATION:2           KEDAH              STATE 6.121040  100.3601
##  3  LOCATION:3        KELANTAN              STATE 6.056660  102.2645
##  4  LOCATION:4    KUALA LUMPUR              STATE 3.143000  101.6948
##  5  LOCATION:5          LABUAN              STATE 4.890934  114.9428
##  6  LOCATION:6          MELAKA              STATE 2.231926  102.2943
##  7  LOCATION:7 NEGERI SEMBILAN              STATE 2.729700  101.9381
##  8  LOCATION:8          PAHANG              STATE 3.807700  103.3260
##  9  LOCATION:9    PULAU PINANG              STATE 5.411230  100.3354
## 10 LOCATION:10           PERAK              STATE 4.584100  101.0829
## 11 LOCATION:11          PERLIS              STATE 6.441400  100.1986
## 12 LOCATION:12       PUTRAJAYA              STATE 2.916670  101.7000
## 13 LOCATION:13           SABAH              STATE 5.974900  116.0724
## 14 LOCATION:14         SARAWAK              STATE 1.583330  110.3333
## 15 LOCATION:15        SELANGOR              STATE 3.085070  101.5328
## 16 LOCATION:16      TERENGGANU              STATE 5.330200  103.1408

mywx_towns()
## # A tibble: 51 x 6
##              id        name locationcategoryid locationrootid latitude longitude
##  *        <chr>       <chr>              <chr>          <chr>    <dbl>     <dbl>
##  1 LOCATION:122  AYER HITAM               TOWN     LOCATION:1   1.9150  103.1808
##  2 LOCATION:123  BATU PAHAT               TOWN     LOCATION:1   1.8548  102.9325
##  3 LOCATION:124 JOHOR BAHRU               TOWN     LOCATION:1   1.4655  103.7578
##  4 LOCATION:125       LABIS               TOWN     LOCATION:1   2.3850  103.0210
##  5 LOCATION:126     TANGKAK               TOWN     LOCATION:1   2.2673  102.5453
##  6 LOCATION:127        MUAR               TOWN     LOCATION:1   2.0442  102.5689
##  7 LOCATION:128       PAGOH               TOWN     LOCATION:1   2.1495  102.7704
##  8 LOCATION:129      KLUANG               TOWN     LOCATION:1   2.0251  103.3328
##  9 LOCATION:130 KOTA TINGGI               TOWN     LOCATION:1   1.7381  103.8999
## 10 LOCATION:131     MERSING               TOWN     LOCATION:1   2.4312  103.8405
## # ... with 41 more rows

mywx_touristdests()
## # A tibble: 30 x 6
##              id              name locationcategoryid locationrootid latitude longitude
##  *        <chr>             <chr>              <chr>          <chr>    <dbl>     <dbl>
##  1 LOCATION:310     BATU FERINGGI        TOURISTDEST     LOCATION:9  5.47090  100.2453
##  2 LOCATION:311     BUKIT BENDERA        TOURISTDEST     LOCATION:9  2.37330  102.5104
##  3 LOCATION:312      BUKIT TINGGI        TOURISTDEST     LOCATION:8  2.28720  103.6726
##  4 LOCATION:313      BUKIT FRASER        TOURISTDEST     LOCATION:8  3.71260  101.7412
##  5 LOCATION:314 CAMERON HIGHLANDS        TOURISTDEST     LOCATION:8  4.48333  101.4500
##  6 LOCATION:315         CHERATING        TOURISTDEST     LOCATION:8  4.12557  103.3939
##  7 LOCATION:316            DESARU        TOURISTDEST     LOCATION:1  1.54020  104.2680
##  8 LOCATION:317 GENTING HIGHLANDS        TOURISTDEST     LOCATION:8  3.39545  101.7792
##  9 LOCATION:318             KIJAL        TOURISTDEST    LOCATION:16  4.35000  103.4833
## 10 LOCATION:319             LUMUT        TOURISTDEST    LOCATION:10  4.23230  100.6298
## # ... with 20 more rows

您可以将它们用作查找表,然后使用该ID获取预测数据:

glimpse(mywx_forecast("LOCATION:237", "2017-08-13", "2017-08-13"))
## Observations: 6
## Variables: 12
## $ locationid       <chr> "LOCATION:237", "LOCATION:237", "LOCATION:237", "LOCATION:237", "LOCATION:237", "LOCATION:...
## $ locationname     <chr> "PUTRAJAYA", "PUTRAJAYA", "PUTRAJAYA", "PUTRAJAYA", "PUTRAJAYA", "PUTRAJAYA"
## $ locationrootid   <chr> "LOCATION:12", "LOCATION:12", "LOCATION:12", "LOCATION:12", "LOCATION:12", "LOCATION:12"
## $ locationrootname <chr> "PUTRAJAYA", "PUTRAJAYA", "PUTRAJAYA", "PUTRAJAYA", "PUTRAJAYA", "PUTRAJAYA"
## $ date             <dttm> 2017-08-12 16:00:00, 2017-08-12 16:00:00, 2017-08-12 16:00:00, 2017-08-12 16:00:00, 2017-...
## $ datatype         <chr> "FGM", "FGA", "FGN", "FMINT", "FMAXT", "FSIGW"
## $ value            <chr> "Cloudy", "Thunderstorms", "Cloudy", "26", "33", "Thunderstorms"
## $ latitude         <dbl> 2.91667, 2.91667, 2.91667, 2.91667, 2.91667, 2.91667
## $ longitude        <dbl> 101.7, 101.7, 101.7, 101.7, 101.7, 101.7
## $ attributes.unit  <chr> NA, NA, NA, "Celcius", "Celcius", NA
## $ attributes.code  <chr> NA, NA, NA, NA, NA, "tstorm"
## $ attributes.when  <chr> NA, NA, NA, NA, NA, "Afternoon"

即使是范围:

vals <- mywx_forecast("LOCATION:237", "2017-08-01", "2017-08-13")

glimpse(vals)
## Observations: 51
## Variables: 12
## $ locationid       <chr> "LOCATION:237", "LOCATION:237", "LOCATION:237", "LOCATION:237", "LOCATION:237", "LOCATION:...
## $ locationname     <chr> "PUTRAJAYA", "PUTRAJAYA", "PUTRAJAYA", "PUTRAJAYA", "PUTRAJAYA", "PUTRAJAYA", "PUTRAJAYA",...
## $ locationrootid   <chr> "LOCATION:12", "LOCATION:12", "LOCATION:12", "LOCATION:12", "LOCATION:12", "LOCATION:12", ...
## $ locationrootname <chr> "PUTRAJAYA", "PUTRAJAYA", "PUTRAJAYA", "PUTRAJAYA", "PUTRAJAYA", "PUTRAJAYA", "PUTRAJAYA",...
## $ date             <dttm> 2017-08-06 16:00:00, 2017-08-06 16:00:00, 2017-08-06 16:00:00, 2017-08-06 16:00:00, 2017-...
## $ datatype         <chr> "FGM", "FGA", "FGN", "FMINT", "FMAXT", "FSIGW", "FGM", "FGA", "FGN", "FMINT", "FMAXT", "FS...
## $ value            <chr> "No rain", "Rain", "No rain", "25", "33", "Rain", "Rain", "Rain", "No rain", "24", "34", "...
## $ latitude         <dbl> 2.91667, 2.91667, 2.91667, 2.91667, 2.91667, 2.91667, 2.91667, 2.91667, 2.91667, 2.91667, ...
## $ longitude        <dbl> 101.7, 101.7, 101.7, 101.7, 101.7, 101.7, 101.7, 101.7, 101.7, 101.7, 101.7, 101.7, 101.7,...
## $ attributes.unit  <chr> NA, NA, NA, "Celcius", "Celcius", NA, NA, NA, NA, "Celcius", "Celcius", NA, NA, NA, NA, "C...
## $ attributes.code  <chr> NA, NA, NA, NA, NA, "rain", NA, NA, NA, NA, NA, "rain", NA, NA, NA, NA, NA, "sunny", NA, N...
## $ attributes.when  <chr> NA, NA, NA, NA, NA, "Afternoon", NA, NA, NA, NA, NA, "Morning and Afternoon", NA, NA, NA, ...

返回格式对于“数据科学”操作来说并不是最佳选择,但是你可以解决这个问题:

dplyr::filter(vals, datatype %in% c("FMINT", "FMAXT")) %>% 
  mutate(value = as.numeric(value)) %>% 
  ggplot(aes(date, value, color = datatype)) +
  geom_line()

enter image description here