从R Dataframe中的JSON提取特定日期的值

时间:2018-07-08 11:02:53

标签: r dataframe dplyr tidyverse

我有以下提到的数据框:

import $ from 'jquery'

alert()

Json数据示例:

ID      Rank    Name       Json_Data
IR-122  RE      AFG        {as below sample}
IR-122  UI      SSw        {as below sample}
IR-123  RF      HEr        {as below sample}
IR-123  RO      djf        {as below sample}
IR-124  RE      der        {as below sample}
IR-125  RF      fet        {as below sample}

我的模式json通过使用我想提取的上述日期框架携带Month并研究该月的每个日期:

  • 最长月份(以MMM-YY格式)显示在一列中

  • 第1、7、14、21和28日的值

下面提到的示例输出中的

{"Jan-2018":{"10":50000.0,"11":50000.0,"12":15202.0,"13":10089.0,"14":10089.0,"15":9589.0,"16":9589.0,"17":18941.0,"18":15246.75,"19":5053.75,"20":44092.75,"21":36630.75,"22":9334.75,"23":5254.75,"24":4357.25,"25":3357.25,"26":44626.25,"27":49292.25,"28":48292.25,"29":43371.8,"30":38675.8,"31":37988.12},"Mar-2018":{"1":30799.02,"2":20775.42,"3":20657.42,"4":20657.42,"5":12657.42,"6":11110.22,"7":11110.22,"8":11110.22,"9":11111.22,"10":30272.22,"11":30272.22,"12":25316.22,"13":25316.22,"14":25316.22,"15":25316.22,"16":25316.22,"17":25316.22,"18":25316.22,"19":25316.22,"20":25316.22,"21":15316.22,"22":15316.22,"23":15316.22,"24":15316.22,"25":15204.12,"26":14791.12,"27":14791.12,"28":14791.12,"29":14791.12,"30":14791.12,"31":14791.12},"Feb-2018":{"1":36749.12,"2":36483.37,"3":35254.87,"4":27254.87,"5":15880.87,"6":14173.87,"7":7934.87,"8":7091.87,"9":5797.87,"10":5797.87,"11":5797.87,"12":283841.87,"13":283418.87,"14":283418.87,"15":253426.37,"16":242226.37,"17":227226.37,"18":197226.37,"19":147226.37,"20":111799.02,"21":111799.02,"22":66799.02,"23":64799.02,"24":64799.02,"25":63799.02,"26":53799.02,"27":36799.02,"28":36799.02},"Apr-2018":{"1":14791.12,"2":14791.12,"3":14791.12,"4":14791.12,"5":10791.12,"6":10791.12,"7":10791.12,"8":10791.12,"9":10755.72,"10":5799.72,"11":5799.72,"12":5799.72,"13":5799.72,"14":5799.72,"15":5799.72,"16":5799.72,"17":5799.72,"18":5799.72,"19":5728.92,"26":728.92,"27":728.92,"28":728.92,"29":728.92,"30":728.92}} 值将是A_1在1日的值:Apr-2018

14791.12的值应为2018年4月7日:A_2依此类推。

我需要从最长月份(不包括最长月份)起的4个月内提供此服务。

10791.12以下是大多数月份的实际前一个月,而A_1A_2的精确前一个月,依此类推,我仅从A_1到{{ 1}}相同的列将重复A_1A_28B_2月。

在A_1中,分析将是A_1月份的第一个日期,在A_1月份的第7天的A_7读数中,其他三个月也是如此。并且这些值必须为group_by C_3C_4

在我的样本Json数据中,只有4个月,最大月份为2018年4月,因此在这种情况下,A_1为2018年3月,B_2为2018年2月,C_3为2018年1月,D_4为为2017年12月(其中D_1,D_7_D_14,D_21和D_28为ID)。

在以下模式中,我预计最大月份为5月18日。

样本输出:

Rank

样本输出数据帧:

NA

1 个答案:

答案 0 :(得分:3)

json_to_df <- function(data){
  json_as_list <- jsonlite::fromJSON(data)
  months       <- names(json_as_list)
  last4months  <- tail(months[order(lubridate::myd(paste0(months,"-01")))],4)
  max_month    <- tail(last4months,1)
  other_months <- head(last4months,-1)
  other_months_suffixes <- paste0(LETTERS[seq_along(other_months)],"_")
  last_month   <- tail(other_months,1)
  days         <- c('1','7','14','21','28')
  get_month_list <- function(x) json_as_list[[x]][days]
  list_subset <- Map(function(x,y) setNames(get_month_list(x),paste0(y,days)),
                     rev(other_months), other_months_suffixes)
  list_subset <- unlist(list_subset, recursive = FALSE)
  names(list_subset) <- gsub("^.*?\\.","",names(list_subset))
  list_subset <- map_if(list_subset, is.null,~NA)

  only_nas <- setNames(replicate(20,NA,F),paste(sep="_",rep(LETTERS[1:4],each=5),rep(days,4)))
  missing <- names(only_nas)[! names(only_nas) %in% names(list_subset)]
  list_subset <- c(list_subset, only_nas[missing])


  list_months <- setNames(as.list(other_months),paste0(other_months_suffixes,0))
  only_nas2 <- setNames(replicate(4,NA,F),paste(sep="_",LETTERS[1:4],0))
  missing2 <- names(only_nas2)[! names(only_nas2) %in% names(list_months)]
  list_months <- c(list_months, only_nas2[missing2])

  output_list <- c(
    Max_Month = max_month,
    list_months,
    list_subset)
  data.frame(output_list)
}

library(jsonlite)
library(lubridate)
library(tidyverse)
df %>%
  mutate(Json_Data = map(Json_Data,json_to_df)) %>%
  unnest
#       ID Rank Name Max_Month      A_0      B_0      C_0 D_0      A_1      A_7     A_14     A_21     A_28      B_1
# 1 IR-122   RE  AFG  Apr-2018 Jan-2018 Feb-2018 Mar-2018  NA 30799.02 11110.22 25316.22 15316.22 14791.12 36749.12
#       B_7     B_14   B_21     B_28 C_1 C_7  C_14     C_21     C_28 D_1 D_7 D_14 D_21 D_28
# 1 7934.87 283418.9 111799 36799.02  NA  NA 10089 36630.75 48292.25  NA  NA   NA   NA   NA