我列出了140个data.frame类型的元素(' my.list')。我想为某个data.frame中的某组行计算特定列中某些值范围的350个平均值(这有点神秘);所以,350个不同的平均值如:
我还有另一个data.frame(' my.dfAverage'),它指示了平均需要哪些data.frame,column和rows。我想为这个data.frame写出350个不同的平均值和标准偏差(所以列数:' average_id',' dataframe_number',' column_name',& #39; row_numbers','平均'和' st_dev')。某些值范围具有NA'可以删除这些值以计算平均值。
根据此data.frame中的信息自动计算data.frames列表中350个平均值和标准偏差的最佳方法是什么?我想过创建一个for循环(或者也许是lapply函数?),但我对这些函数很新,所以我不确定这里要走的路是什么。
我的data.frames列表的小型可重现示例:
my.df1 <- data.frame(ID = c(1:5),
Measure1 = c(2247,2247,1970,1964,1971),
Measure2 = c(2247,2247,NA,1964,1971))
my.df2 <- data.frame(ID = c(1:4),
Measure3 = c(2247,NA,1970,1964),
Measure5 = c(2247,2247,NA,1964))
my.df3 <- data.frame(ID = c(1:4),
Measure6 = c(2247,600,1970,1964),
Measure8 = c(2247,2247,NA,1964))
my.list <- list(list1 = my.df1, list2 = my.df2, list3 = my.df3)
平均值和标准差的所需输出表:
my.dfAverage <- data.frame(average_id = c(1:3),
dataframe_number = c(1,2,3),
column_name = c('Measure1','Measure3','Measure6'),
row_numbers = c('1:3','1:4','1:2'),
average = (NA),
st_dev = (NA))
答案 0 :(得分:1)
使用tidyverse的解决方案。
首先,根据my.dfAverage
展开row_numbers
。
library(tidyverse)
my.dfAverage2 <- my.dfAverage %>%
separate(row_numbers, into = c("start", "end")) %>%
mutate(row_numbers = map2(start, end, `:`)) %>%
unnest() %>%
select(-start, -end) %>%
mutate(row_numbers = as.integer(row_numbers),
dataframe_number = as.integer(dataframe_number))
其次,转换my.list
中的所有数据帧并将它们组合成单个数据帧。
my.list.df <- my.list %>%
setNames(1:length(.)) %>%
map_dfr(function(x){
x2 <- x %>%
gather(column_name, value, -ID)
return(x2)
},.id = "dataframe_number") %>%
mutate(ID = as.integer(ID), dataframe_number = as.integer(dataframe_number)) %>%
rename(row_numbers = ID)
第三,合并my.dfAverage2
和my.list.df
并计算平均值和标准差。 my.dfAverage3
是最终输出。
my.dfAverage3 <- my.dfAverage2 %>%
left_join(my.list.df, by = c("dataframe_number", "column_name", "row_numbers")) %>%
group_by(average_id, dataframe_number, column_name) %>%
summarise(row_numbers = paste(min(row_numbers), max(row_numbers), sep = ":"),
average = mean(value, na.rm = TRUE),
st_dev = sd(value, na.rm = TRUE)) %>%
ungroup()
my.dfAverage3
# A tibble: 3 x 6
# average_id dataframe_number column_name row_numbers average st_dev
# <int> <int> <chr> <chr> <dbl> <dbl>
# 1 1 1 Measure1 1:3 2155 160
# 2 2 2 Measure3 1:4 2060 162
# 3 3 3 Measure6 1:2 1424 1165
数据强>
my.list
与OP&#39; my.list
相同。
my.dfAverage <- data.frame(average_id = c(1:3),
dataframe_number = c(1,2,3),
column_name = c('Measure1','Measure3','Measure6'),
row_numbers = c('1:3','1:4','1:2'))
答案 1 :(得分:1)
这是与上面给出的方法不同的方法:我将仅使用base r
函数:要注意,确保数据具有stringsAsFactors=FALSE
编写一个函数但确保正确索引mylist
。然后计算这个函数f(...,na.rm=T)
。使用apply
编写函数:
fun1=function(f){with(my.dfAverage,
mapply(function(x,y,z)
f(x[eval(parse(text=y)),z],na.rm=T),my.list,row_numbers,column_name))}
transform(my.dfAverage,average=fun1(mean),st_dev=fun1(sd))
average_id dataframe_number column_name row_numbers average st_dev
1 1 1 Measure1 1:3 2154.667 159.9260
2 2 2 Measure3 1:4 2060.333 161.6859
3 3 3 Measure6 1:2 1423.500 1164.6049
使用的数据:
my.dfAverage <- data.frame(average_id = c(1:3),
dataframe_number = c(1,2,3),
column_name = c('Measure1','Measure3','Measure6'),
row_numbers = c('1:3','1:4','1:2'),
average = (NA),
st_dev = (NA),stringsAsFactors = F)