快速将对象从 JSON 转换为数据帧

时间:2021-04-28 16:21:24

标签: r json dplyr rjsonio

我有一个可以通过 JSON 格式的 URL 访问的对象,我想将其中的一部分放入数据帧格式以分析 R 中的数据。我目前这样做如下:

# Read data in using fromJSON function
data <- RJSONIO::fromJSON('https://api-prod.footballindex.co.uk/football.allTradable24hrchanges?page=1&per_page=5000&sort=asc')

# The above is an example dataset to use for this question
# In my actual dataset there are fields which only exist in some elements in list so adding this onto example
# I want to handle these by returning in the end dataframe with NA if it doesn't exist
data[['items']][[1]]$newField <- 1

# It is only the data in the items field I am interested in
# Unlist each element to get all nested emelents within the lists in flat format
dataList <- lapply(data[['items']], unlist)

# Combine all elemnts of list together
dataDF <- dplyr::bind_rows(dataList)

# Convert into data.frame
dataDF <- data.frame(dataDF)

这是可行的,但是 bind_rows 部分需要很长时间

> system.time(dataDF <- dplyr::bind_rows(dataList))
   user  system elapsed 
 42.195   0.000  42.216 

感觉必须有一种更快的方法来做到这一点。

有人告诉我 data.table::rbindlist 是一个更快的选择,但使用它会给我错误信息

> dataDF <- data.table::rbindlist(dataList)
Error in data.table::rbindlist(dataList) : 
  Item 1 of input is not a data.frame, data.table or list

曾建议在一个运行速度快的答案中使用 do.call(rbind...,但是当存在仅在某些元素中的字段时,它无法正确处理此问题。例如

dataDF2 <- data.frame(do.call(rbind, dataList))
> head(dataDF$country)
[1] "Côte d'Ivoire" "Italy"         "England"       "Scotland"      "Germany"       "France"       
> head(dataDF2$country)
[1] "Côte d'Ivoire" "1.65"          "1.62"          "FALSE"         "2.59"          "France"    

在此先感谢您的帮助

1 个答案:

答案 0 :(得分:1)

data <- RJSONIO::fromJSON('https://api-prod.footballindex.co.uk/football.allTradable24hrchanges?page=1&per_page=5000&sort=asc')

system.time(dataDF <- as.data.frame(do.call(rbind, data[['items']])))

   user  system elapsed 
  0.007   0.000   0.006