如何在R中的数组中应用函数?

时间:2015-04-25 13:09:19

标签: arrays r function

我有一个包含纬度,经度和日期时间(Year_Month_Day_Hour)的3d数组。哪个是R在群组中应用函数而不是数组的最佳方式(在这种情况下是年,月或月)?结果应该是具有平均值的数组。 3维度是年,月或日。

str(data)
 num [1:7, 1:7, 1:5] 977 994 1010 1020 1026 ...
 - attr(*, "dimnames")=List of 3
  ..$ : chr [1:7] "60" "57.5" "55" "52.5" ...
  ..$ : chr [1:7] "-30" "-27.5" "-25" "-22.5" ...
  ..$ : chr [1:5] "2014_10_01_00" "2014_10_01_06" "2014_10_01_12" "2014_10_01_18" ...

示例(截断):

dput(data) structure(c(977.2, 994.4, 1009.8, 1020.1, 1026.4, 1029.4, 1029.2, 
978.7, 995.7, 1010.2, 1020.5, 1026.5, 1028.8, 1028.3, 982, 997.5, 
1011.3, 1021.2, 1026.1, 1027.4, 1027.1, 986.2, 999.9, 1013, 1021.7, 
1025.1, 1025.7, 1026, 990.6, 1002.7, 1014.5, 1021.3, 1023.9, 
1024.7, 1025.6, 995.1, 1005.7, 1015.2, 1019.9, 1022.6, 1024.5, 
1025.9, 999.1, 1008, 1015.1, 1018.6, 1021.8, 1024.5, 1026.6, 
982.1, 998.9, 1011.8, 1020.1, 1025.5, 1028.4, 1028.8, 981.9, 
999.3, 1012.7, 1021.2, 1026.4, 1028.8, 1029, 983.9, 1000.2, 1013.5, 
1022.1, 1027, 1028.9, 1028.9, 987.1, 1001.8, 1014.6, 1022.7, 
1027.3, 1028.6, 1028.2, 990.9, 1004.1, 1016.1, 1023.3, 1027.2, 
1027.9, 1027.4, 995.1, 1006.9, 1017.8, 1023.8, 1026.8, 1027, 
1026.9, 999.5, 1010.1, 1019.1, 1023.8, 1025.9, 1026.1, 1026.9, 
990.3, 1002.3, 1010.9, 1018.3, 1024, 1027.6, 1028.6, 990.6, 1004.1, 
1013.2, 1020.8, 1026.2, 1029.3, 1029.8, 992.1, 1005.5, 1015.2, 
1023, 1028, 1030.4, 1030.5, 994.5, 1007, 1017.2, 1024.7, 1029.4, 
1031, 1030.3, 997.4, 1008.8, 1019, 1025.7, 1030, 1031, 1029.8, 
1000.1, 1010.9, 1020.9, 1026.5, 1030, 1030.6, 1029.5, 1002.9, 
1013.3, 1022.6, 1027.2, 1029.7, 1029.7, 1029.2, 993.6, 997.5, 
1001.3, 1007.4, 1015.5, 1022.7, 1026.4, 996.1, 1001.1, 1005.8, 
1012.7, 1020.1, 1025.6, 1027.9, 998.4, 1004.5, 1010.4, 1017.6, 
1023.8, 1027.6, 1029.1, 1000.2, 1007.3, 1014.4, 1021.5, 1026.4, 
1029, 1029.7, 1002, 1010, 1017.8, 1024.3, 1028.4, 1029.9, 1029.6, 
1004.3, 1012.9, 1020.7, 1026.3, 1029.7, 1030.2, 1029.3, 1006.9, 
1016, 1023.2, 1027.7, 1030.3, 1029.7, 1028.6, 987.9, 989.6, 995.1, 
1002.9, 1010.8, 1018.9, 1025.1, 989.8, 990, 995.1, 1004.7, 1013.9, 
1021.8, 1026.8, 993.1, 992.6, 998.1, 1008.8, 1018, 1024.6, 1028.3, 
996.9, 997.3, 1003.9, 1014, 1021.9, 1026.8, 1029.1, 1000.3, 1003.1, 
1010.5, 1019, 1025.2, 1028.5, 1029.6, 1003.6, 1008.7, 1016.4, 
1023.1, 1027.8, 1029.8, 1029.9, 1007.3, 1013.7, 1020.8, 1026.3, 
1029.8, 1030.2, 1029.6), .Dim = c(7L, 7L, 5L), .Dimnames = list(
c("60", "57.5", "55", "52.5", "50", "47.5", "45"), c("-30", 
"-27.5", "-25", "-22.5", "-20", "-17.5", "-15"), c("2014_10_01_00", 
"2014_10_01_06", "2014_10_01_12", "2014_10_01_18", "2014_10_02_00"
)))

SOLUTION:

group <- as.factor(as.Date(dimnames(data)[[3]],format="%Y_%m_%d"))

aperm(apply(data,c(1,2), by, group, mean),c(2,3,1))

2 个答案:

答案 0 :(得分:1)

首先我建议整理你的数据。现在我们无法真正说出它的样子。

要进行分组,请为您的日期创建列。我不知道是什么日期&#34; 2014_10_01_00&#34;可能是,但如果2014年是年份而月份是10月,则将这些分为两列。我不认为将长线和纬度存储为类型字符是有意义的,也许数字可能更好。

其次,查看data.table包。它使操作数据(特别是大数据)变得轻而易举。

要在不同的组中使用数据表上的函数,请执行

my_dt[ , lapply(.SD, my_func), by = c("year", "month")]

其中yearmonth是数据表中的列名。

答案 1 :(得分:0)

只需将维度指定为apply函数中的第二个参数即可。例如,将“日期”作为保证金汇总:

> apply(array, 3, sum)
# 2014_10_01_00 2014_10_01_06 2014_10_01_12 2014_10_01_18 2014_10_02_00 
#       49691.3       49782.3       49919.6       49851.4       49639.0 

如果您的尺寸有名称,您也可以使用名称作为字符串作为第二个参数。

修改

OP希望按日期对结果进行分组。该功能可以为期望的结果提供指导:

myapply <- function(array, d, fun){
  # function to apply "fun" to "array"
  # which is of class array with dimension
  # 3. array is grouped by d which is a number
  # between 1 and 4
  # 1: year
  # 2: month
  # 3: day
  # 4: hour
  d.name <- strsplit(dimnames(array)[[3]], "_")

  # make groups
  names  <- lapply(d.name, function(x, d) 
                     paste(x[1:d], collapse= "_"), d = d)
  groups <- unique(names)

  # get the indices for the groups
  indices <- lapply(groups, function(x, names) 
                      which(unlist(names) %in% x), names = names)

  # compute the function on the groups
  results <- lapply(indices, function(ind, arr, fun) 
                       fun(as.vector(arr[,,ind])), arr = array, fun = fun)

  names(results) <- unlist(groups)
  return(results)
}

结果:

# mean grouping by day
myapply(array, 3, mean)
# $`2014_10_01`
# [1] 1016.554
# 
# $`2014_10_02`
# [1] 1013.041


# mean, grouping by hour
myapply(array, 4, mean)
# $`2014_10_01_00`
# [1] 1014.108
# 
# $`2014_10_01_06`
# [1] 1015.965
# 
# $`2014_10_01_12`
# [1] 1018.767
# 
# $`2014_10_01_18`
# [1] 1017.376
# 
# $`2014_10_02_00`
# [1] 1013.041