Question

我想了解1996年至2016年期间Center for Disease Control site通报军团病的病例数据。我设法使用RSocrata软件包并仅使用Socrata和{{3}检索2014年至2016年的数据}。

我将如何在2014年之前检索其余信息？

以下是我在2014年，2015年和2016年使用的代码：

  #Legionellosis data 

  df.leg2014 <- read.socrata("https://data.cdc.gov/resource/cmap-p7au.json")#2014
  df.leg2015 <- read.socrata("https://data.cdc.gov/resource/haxn-dihy.json") #2015
  df.leg2016 <- read.socrata("https://data.cdc.gov/resource/wg57-d6dj.json") #2016

非常感谢任何建议！

Answer 1

您必须添加自己的列标题，并使用此功能循环您的数年和数周。有一些年/周组合，return an empty table或数据为otherwise unavailable。它似乎也返回不同数量的列。有些有12个，有些有10个。不确定发生了什么。该函数有一行删除所有NA的列。您可能想要评论该行。我试图在代码中考虑到这一点。我没有很好地测试它。

read_MMWR_table <- function(year=1997, week=18){
  url <- paste0("https://wonder.cdc.gov/mmwr/mmwr_reps.asp?mmwr_year=",
                year, 
                "&mmwr_week=",
                week, 
                "&mmwr_table=2B&request=Export&mmwr_location=")
  tmp <- readLines(url)
  if(grepl("DOCTYPE html", tmp[1])){
    ret <- data.frame()
    print("No table returned, no data...")
  }else{
    start <- which(tmp=="tab delimited data:") + 1
    if(grepl("DOCTYPE html", tmp[1])){
      ret <- data.frame()
      print("No records found for this week / year")
    }else{
      end <- min(which(tmp=="")[which(tmp=="") > 20])
      df <- read.table(textConnection(tmp[start:end]), sep="\t", skip=19, 
                       stringsAsFactors=FALSE, header=FALSE)
      # remove all NA cols
      df <- df[,colSums(is.na(df))<nrow(df)]
      ret <- df
    }
  }
  return(ret)
}

df <- read_MMWR_table(year=2001, week=12)
> head(df)
            V1    V2    V3 V4  V5 V6 V7 V8 V9 V10
1 W.N. CENTRAL 2,534 3,688 61 102 11  5  2 11  14
2        Minn.   411   723  -   -  1  1  -  8   6
3         Iowa   202   224  -   -  2  2  -  -   -
4          Mo. 1,013 1,803 58  98  5  2  1  3   3
5      N. Dak.     9    11  -   -  -  -  -  -   -
6      S. Dak.    47    61  -   -  -  -  -  -   -

如何查找与Web应用程序关联的API？

1 个答案: