我有一个包含df
和col1, col2... col25
列的data.frame Threshold
。
我想创建一个新列A,它为每一行记录col1 ... col25中具有高于阈值的值的列数。
我想我能做到
df$A <- (df[paste("col",1,sep="")] >= df["Threshold"]) + (df[paste("col",2,sep="")] >= df["Threshold"]) + ...
但它不是很优雅,这让我觉得必须有一个更好,更紧凑的方式。
(注意:我需要从字符串重新组合列名,真正的列名是PV1MATH,PV2MATH,PV1SCIE等)。
编辑:生成数据
colnames <- paste("PV", rep(1:2, 5), c("MATH", "SCIE", "ENGI", "PHYS", "ARTS"), sep="")
df <- as.data.frame(matrix(rnorm(200, 60, 20), ncol=10))
names(df) <- colnames
df$Threshold <- rpois(20, 50)
答案 0 :(得分:1)
我已经生成了一些随机数据,因此可以提供一个示例:
> colnames <- paste("PV", rep(1:2, 5), c("MATH", "SCIE", "ENGI", "PHYS", "ARTS"), sep="")
> df <- as.data.frame(matrix(rnorm(200, 60, 20), ncol=10))
> names(df) <- colnames
> df$Threshold <- rpois(20, 50)
> head(df)
PV1MATH PV2SCIE PV1ENGI PV2PHYS PV1ARTS PV2MATH PV1SCIE PV2ENGI PV1PHYS PV2ARTS Threshold
1 65.38862 59.10253 36.58240 54.32805 9.181924 55.01604 73.377464 75.57304 60.93116 31.99255 49
2 46.58772 81.16455 70.60132 19.45667 93.797606 12.80517 47.920166 51.90083 41.72037 63.98710 50
3 67.02016 57.85148 64.67905 24.49892 48.827826 57.26432 53.117871 67.83863 57.56008 67.69975 41
4 61.36172 107.93095 70.78672 38.21072 75.752956 48.12871 40.698131 82.58197 60.66945 61.52466 51
5 19.54413 51.27288 52.15215 71.99829 64.433654 116.80112 47.297671 57.39038 97.73618 75.57284 50
6 68.37724 40.35299 74.26690 60.44868 60.037653 40.99726 6.843594 84.68163 65.08556 62.26077 45
>
> df$Above.Threshold <- rowSums(df[, -grep("Threshold", names(df))] > df$Threshold)
> head(df)
PV1MATH PV2SCIE PV1ENGI PV2PHYS PV1ARTS PV2MATH PV1SCIE PV2ENGI PV1PHYS PV2ARTS Threshold Above.Threshold
1 65.38862 59.10253 36.58240 54.32805 9.181924 55.01604 73.377464 75.57304 60.93116 31.99255 49 7
2 46.58772 81.16455 70.60132 19.45667 93.797606 12.80517 47.920166 51.90083 41.72037 63.98710 50 5
3 67.02016 57.85148 64.67905 24.49892 48.827826 57.26432 53.117871 67.83863 57.56008 67.69975 41 9
4 61.36172 107.93095 70.78672 38.21072 75.752956 48.12871 40.698131 82.58197 60.66945 61.52466 51 7
5 19.54413 51.27288 52.15215 71.99829 64.433654 116.80112 47.297671 57.39038 97.73618 75.57284 50 8
6 68.37724 40.35299 74.26690 60.44868 60.037653 40.99726 6.843594 84.68163 65.08556 62.26077 45 7
在您的情况下,您可以简单地使用衬垫
df$Above.Threshold <- rowSums(df[, -grep("Threshold", names(df))] > df$Threshold)
假设数据是名为data.frame
的{{1}}。
或者,如果要选择在哪些列上计算上述阈值和,则可以更改df
条件。例如,选择前缀为grep
的列:
PV