Question

我有一个数据框，其中最多包含5个测量值（x）及其对应的时间：

df = structure(list(x1 = c(92.9595722286402, 54.2085219673818, 
46.3227062573019, 
NA, 65.1501442134141, 49.736451235317), time1 = c(43.2715277777778, 
336.625, 483.975694444444, NA, 988.10625, 510.072916666667), 
x2 = c(82.8368681534474, 53.7981639701784, 12.9993531230419, 
NA, 64.5678816290574, 55.331442940348), time2 = c(47.8166666666667, 
732, 506.747222222222, NA, 1455.25486111111, 958.976388888889
), x3 = c(83.5433119686794, 65.723072881366, 19.0147593408309, 
NA, 65.1989838202356, 36.7000828457705), time3 = c(86.5888888888889, 
1069.02083333333, 510.275, NA, 1644.21527777778, 1154.95694444444
), x4 = c(NA, 66.008102917677, 40.6243513885846, NA, 62.1694420909955, 
29.0078249523063), time4 = c(NA, 1379.22986111111, 520.726388888889, 
NA, 2057.20833333333, 1179.86805555556), x5 = c(NA, 61.0047472617535, 
45.324715258421, NA, 59.862110645527, 45.883161439362), time5 = c(NA, 
1825.33055555556, 523.163888888889, NA, 3352.26944444444, 
1364.99513888889)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -6L))

“ NA”表示该人（行）没有测量值。

我想计算最后一个现有度量与第一个度量之间的差异。

因此，第一个是x3减去x1（6.4），第二个是-6.8，依此类推。

我尝试了类似的方法，但是没有用：

df$diff = apply(df %>% select(., contains("x")), 1, function(x) head(x, 
na.rm = T) - tail(x, na.rm=T))

有什么建议吗？另外，应用/逐行是最有效的方法，还是有矢量化函数可以做到这一点？

Answer 1

向量化方法将使用max.col，其中我们使用"first"参数获得"last"和ties.method非NA值

#Get column number of first and last col
first_col <- max.col(!is.na(df[x_cols]), ties.method = "first")
last_col <- max.col(!is.na(df[x_cols]), ties.method = "last")

#subset the dataframe to include only `"x"` cols
new_df <- as.data.frame(df[grep("^x", names(df))])

#Subtract last non-NA value with the first one
df$new_calc <- new_df[cbind(1:nrow(df), last_col)] - 
               new_df[cbind(1:nrow(df), first_col)]

您可以使用apply

x_cols <- grep("^x", names(df))

df$new_calc <- apply(df[x_cols], 1, function(x) {
    new_x <- x[!is.na(x)]
    if (length(new_x) > 0)
      new_x[length(new_x)] - new_x[1L]
    else NA
})

Answer 2

我们可以在tidyverse上使用tbl_df方法。创建一个行名称列（rownames_to_column），gather将'x'列转换为'long'格式，同时删除按行名称分组的NA元素（na.rm = TRUE），得到{{ 1}}个diff和first'val'ue的值，并将提取的列与原始数据集'df'绑定

last

每行的第一个非NA和最后一个非NA之间的差异

2 个答案: