一个DF的数据帧列表;从列表中的每个DF中提取日期列,将所有值传递给单个DF w /列1985-2017)

时间:2017-12-19 19:10:36

标签: r purrr

我有一个169个数据框列表( assetcount_dfs ),对应于地理网格上的方块,每个方块都包含一组资产。我想填写一个单独的数据框,计算1985 - 2017年间每个日期每个日期开始的资产数量。

以下是这个数据框列表的结构:

 Square1_DF (3 rows/assets)   | x | y | dates char[1989, N/A, 1991]
 ...
 Square169_DF (1 row/asset)   | x | y | dates char[2002]

我想将此转换为计算这些日期的一个数据框,在' dateDF ':

            | 1989 | 1990 | ... | 2015 | 2016 | 2017 
 Square 1      0      1            3      2      0      
 ...
 Square 169    0      0            0      1      3

这是我的数据的玩具样本。在assetcount_dfs中的每个数据框内,' val' column表示我想用 dateDF 填充的日期:

  sdf1 <- data.frame(a = c("1","4","5","1"), x = c("sdf","asf","asdf","sdf"), val = c("2014","2012","#N/A", "2001"))
  sdf2 <- data.frame(a = c("1","4"), x = c("sdf","asdf"), val = c("#N/A","2011"))
  sdf3 <- data.frame(a = c("1","4","5","1","1"), x = c("sdf","asf","asdf","sdf","sdf"), val = c("2010","2015","2000","2002", "2003"))

  assetcount_dfs <- list(sdf1 = sdf1,sdf2 = sdf2,sdf3 = sdf3)

  date_range <- 1985:2017
  dateDF <- data.frame(matrix(ncol = length(date_range),nrow = 3))     # actual length is 169 rows, only using 3 for this example
  colnames(dateDF) <- paste0('X',1985:2017) # name columns 'X'DATE
  rownames(dateDF) <- names(assetcount_dfs)
  dateDF[] <- 0          # filled with zeroes     

当前尝试

在每个数据框架内&#39; val&#39;列,我想检查是否有任何日期值在1985-2017范围内,如果是,请将它们添加到dateDF的X ---日期列。

我尝试使用&#39; purr&#39; (比如lapply)对每个DF进行操作,但我很难理解从这里开始的地方。

invisible(map(listx, function(df) {

for (i in df$val){
    if (as.integer(i) %in% 1985:2017){
    datesDF_colName <- paste0('X',i)
    dateDF[substitute(df), datesDF_colName] <- dateDF[[datesDF_colName]] + 1 
      # Attempt to set dateDF value at [grid-square DF's name / row, Column based on Year ]
    } 

}}))

# Output:    
# Error in `[<-.data.frame`(`*tmp*`, substitute(df), datesDF_colName, value = 
# c(1,  : 
#  anyNA() applied to non-(list or vector) of type 'language'
# Called from: `[<-.data.frame`(`*tmp*`, substitute(df), datesDF_colName, 
# value = c(1, 
# 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
# 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 

# Note my sample code for 'listx' for some reason generates DFs with factors, although I am currently dealing with character arrays.

1 个答案:

答案 0 :(得分:1)

我会使用tidyverse()来处理这个问题。不要尝试在循环中编辑dateDF,而是计算一年与数据框ID一起出现的频率,然后将数据重新整形为您正在寻找的格式。

library(tidyverse)

assets2  <- assetcount_dfs %>% 
  # combine all the small data frames into a single big df
  bind_rows(.id = 'rowdf') %>% 
  # toss out the N/A values so they don't get counted
  filter(val != "#N/A")


simpleDateDF <- assets2 %>% 
  # count each year and what data frame it's from
  count(rowdf, val) %>% 
  # spread the years out into columns, using 0 as the default
  spread(val, n, fill = 0)