我的R数据集为150万行和23列,如下所示:
ID Week col1 col2 col3 ...
A 1 2 3 1
A 2 3 4 1
...
A 69 15 2 11
B 1 5 1 2
B 2 6 10 3
...
B 69 2 1 1
Z 1 1 12 2
Z 2 4 5 3
...
Z 69 1 20 2
我想改变每个ID,但仅限于"周" 69,每组ID的最大值的三分之一
例如: col1中ID = A的最大值除以3,并将其替换为原始数据集。
我目前的逻辑似乎不起作用:
index<-unique(data$ID)
dat<-filter(data, id== index[1])
b<-sapply(dat[,3:23],max)
b<-b/3
dat[69,4:23]<-dat[69,4:23]+b
data.alt<-dat
enter code here
for (i in 2:19477)
{
dat<-filter(data, id== index[i])
b<-sapply(dat[,4:23],max)
b<-b/3
dat[69,4:23]<-dat[69,4:23]+b
data.alt<-rbind(data.alt,dat)
}
答案 0 :(得分:1)
我们可以使用data.table
方法。从原始数据集中创建names
的向量,其中列名称中包含col
('nm1'),paste
包含'i'。创建第二个向量('nm2' - 用于在连接时分配值),然后汇总数据集,其中max
'cols'按'ID'分组,并将.SDcols
指定为'nm1',创建列'周'为'69',join
两个数据集on
,'ID','周'并指定(:=
)'nm2'到'nm1'的值列
library(data.table)
nm1 <- grep("col", names(df1), value = TRUE)
nm2 <- paste0("i.", nm1)
df2 <- setDT(df1)[, lapply(.SD, max) , ID, .SDcols = nm1][, Week := factor(69)][]
df1[df2, (nm1) := mget(nm2), on = .(ID, Week)]
df1
如果我们想要将“周期”为69的“nm1”列替换max
值除以3,
setDT(df1)[, (nm1) := lapply(.SD, as.numeric), .SDcol = nm1]
df2 <- df1[, lapply(.SD, function(x) max(x)/3) , ID, .SDcols = nm1][, Week := factor(69)][]
df1[df2, (nm1) := mget(nm2), on = .(ID, Week)]
如果我们需要add
原始值,请将最后一行代码更改为
df1[df2, (nm1) := Map(`+`, mget(nm1), mget(nm2)), on = .(ID, Week)]
df1 <- structure(list(ID = c("A", "A", "A", "B", "B", "B", "Z", "Z",
"Z"), Week = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), .Label = c("1",
"2", "69"), class = "factor"), col1 = c(2L, 3L, 15L, 5L, 6L,
2L, 1L, 4L, 1L), col2 = c(3L, 4L, 2L, 1L, 10L, 1L, 12L, 5L, 20L
), col3 = c(1L, 1L, 11L, 2L, 3L, 1L, 2L, 3L, 2L)), .Names = c("ID",
"Week", "col1", "col2", "col3"), row.names = c(NA, -9L),
class = "data.frame")