我有一个数据集,该数据集的变量ColumnStart标识第一列以计算平均值。我有第二个变量ColumnEnd,它标识该计算中的最后一列。对于第一行,我想计算第5列到第9列的平均值,第二行从第6列到第11列,等等。
输出为:
这是R中更新的dput:
structure(list(ID = c("AAA", "BBB", "CCC", "DDD"), ShortID = c("452L",
"3L", "4L", "324L"), Name = c("PS1", "PS2", "PS3", "PS4"), Route =
c("Internal",
"External", "Internal", "Internal"), ColumnStart = c(7L, 7L,
9L, 8L), ColumnEnd = c(9L, 11L, 13L, 10L), Date1 = c(1L, 5L,
13L, 4L), Date2 = c(2L, 6L, 45L, 3L), Date3 = c(3L, 7L, 23L,
2L), Date4 = c(4L, 8L, 65L, 1L), Date5 = c(5L, 8L, 34L, 3L),
Date6 = c(6L, 9L, 23L, 5L), Date7 = c(7L, 6L, 54L, 6L), Date8 = c(7L,
6L, 1L, 7L), Date9 = c(8L, 9L, 3L, 8L)), .Names = c("ID",
"ShortID", "Name", "Route", "ColumnStart", "ColumnEnd", "Date1",
"Date2", "Date3", "Date4", "Date5", "Date6", "Date7", "Date8",
"Date9"), row.names = c(NA, -4L), class = c("tbl_df", "tbl",
"data.frame"), spec = structure(list(cols = structure(list(ID =
structure(list(), class = c("collector_character",
"collector")), ShortID = structure(list(), class =
c("collector_character",
"collector")), Name = structure(list(), class = c("collector_character",
"collector")), Route = structure(list(), class = c("collector_character",
"collector")), ColumnStart = structure(list(), class =
c("collector_integer",
"collector")), ColumnEnd = structure(list(), class =
c("collector_integer",
"collector")), Date1 = structure(list(), class = c("collector_integer",
"collector")), Date2 = structure(list(), class = c("collector_integer",
"collector")), Date3 = structure(list(), class = c("collector_integer",
"collector")), Date4 = structure(list(), class = c("collector_integer",
"collector")), Date5 = structure(list(), class = c("collector_integer",
"collector")), Date6 = structure(list(), class = c("collector_integer",
"collector")), Date7 = structure(list(), class = c("collector_integer",
"collector")), Date8 = structure(list(), class = c("collector_integer",
"collector")), Date9 = structure(list(), class = c("collector_integer",
"collector"))), .Names = c("ID", "ShortID", "Name", "Route",
"ColumnStart", "ColumnEnd", "Date1", "Date2", "Date3", "Date4",
"Date5", "Date6", "Date7", "Date8", "Date9")), default = structure(list(),
class = c("collector_guess",
"collector"))), .Names = c("cols", "default"), class = "col_spec"))
答案 0 :(得分:3)
这是一个基本的R解决方案,可以在计算平均值之前删除非数字列:
df$ave2 <- apply(df, 1, function(x) {
y <- as.numeric(x[seq.int(x['ColumnStart'], x['ColumnEnd'])])
mean(y[!is.na(y)])
})
df
# A tibble: 4 x 16
ID ShortID Name Route ColumnStart ColumnEnd Date1 Date2 Date3 Date4 Date5 Date6 Date7 Date8 Date9 Average
<chr> <chr> <chr> <chr> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <dbl>
1 AAA 452L PS1 Internal 7 9 1 2 3 4 5 6 7 7 8 2
2 BBB 3L PS2 External 7 11 5 6 7 8 8 9 6 6 9 6.8
3 CCC 4L PS3 Internal 9 13 13 45 23 65 34 23 54 1 3 39.8
4 DDD 324L PS4 Internal 8 10 4 3 2 1 3 5 6 7 8 2
as.numeric
尝试将值转换为numeric
。如果不能,则返回NA
。然后,我们删除NA
的值并计算mean
。
这里是一个单行版本,其工作原理相同,但在计算均值之前使用na.omit
去除了NA
值:
df$Average <- apply(df, 1, function(x) mean(na.omit(as.numeric(x[seq.int(x['ColumnStart'], x['ColumnEnd'])]))))
答案 1 :(得分:1)
另一种方法,不一定建议
rowMeans(df*NA^!(col(df) >= df$ColumnStart & col(df) <= df$ColumnEnd),
na.rm = T)
# [1] 3.000000 7.142857 5.000000 3.333333 6.500000
说明:
col(df) >= df$ColumnStart & col(df) <= df$ColumnEnd
是一个矩阵,在与TRUE
,ColumnStart
规范匹配的(i,j)索引处为ColumnEnd
NA^!(col(df) >= df$ColumnStart & col(df) <= df$ColumnEnd)
是一个矩阵,该矩阵在其他地方的1
和TRUE
处为NA
。用df
对其进行互斥运算得到的矩阵与df
相同,除了所有索引不符合ColumnStart
和ColumnEnd
规范的元素都是NA
< / p>
现在我们可以使用其中的rowMeans
和na.rm = T
来获得所需的结果