我的数据包含 n 列数。第一列是字符串,其他列是值。
例如,可以在下面找到一个df
df <- structure(list(X = structure(c(1L, 6L, 8L, 9L, 4L, 7L, 2L, 3L,
5L), .Label = c("Ajngs ", "HAUDD;HHEYDG", "hdgdhdgh", "hdgduk;ldodjg",
"hdhzd;hftfgd", "Kuksjgd", "sjsggd;pfofjdg", "Tmlsks", "yhfkfu"
), class = "factor"), A1 = c(6197300L, 54415000L, 18671000L,
22473000L, 3922800L, 2137900L, 180210000L, 5053000L, 0L), A2 = c(3701100L,
33892000L, 11169000L, 18095000L, 2734200L, 1423600L, 113860000L,
3231300L, 0L), B1 = c(2496200L, 20523000L, 7502400L, 4378400L,
0L, 714310L, 66351000L, 1821700L, 0L), B2 = c(1124900L, 18487000L,
9858100L, 4413400L, 0L, 2137900L, 80461000L, 0L, 0L)), .Names = c("X",
"A1", "A2", "B1", "B2"), class = "data.frame", row.names = c(NA,
-9L))
我想做的是:每两列都在一起(在这个例子中A1和A2在一起,B1和B2在一起)
所以,我想检查A1是否几乎是两倍(意味着50%一个值大于另一个)A2粘贴相应的字符串而不是值,如果A2是A1的两倍粘贴相应的字符串而不是值。如果它们不比另一个大两倍,则将其设置为NAN
作为一个例子
A1 A2
Ajngs 6197300 3701100
A1几乎是A2的两倍,因此输出应该如下所示
A1 A2
Ajngs Ajngs NAN
预期输出
A1 A2 B1 B2
Ajngs Ajngs NAN Ajngs NAN
Kuksjgd Kuksjgd NAN NAN NAN
Tmlsks NAN NAN NAN NAN
yhfkfu NAN NAN NAN NAN
hdgduk;ldodjg NAN NAN NAN NAN
sjsggd;pfofjdg sjsggd;pfofjdg NAN NAN sjsggd;pfofjdg
HAUDD;HHEYDG HAUDD;HHEYDG NAN NAN NAN
hdgdhdgh NAN NAN hdgdhdgh NAN
hdhzd;hftfgd NAN NAN NAN NAN
答案 0 :(得分:1)
请尝试以下更正后的代码。只需根据自己的喜好更改almostDouble的值即可。这接受4列以上的数据帧。
df <- structure(list(X = structure(c(1L, 6L, 8L, 9L, 4L, 7L, 2L, 3L,
5L), .Label = c("Ajngs ", "HAUDD;HHEYDG", "hdgdhdgh", "hdgduk;ldodjg",
"hdhzd;hftfgd", "Kuksjgd", "sjsggd;pfofjdg", "Tmlsks", "yhfkfu"
), class = "factor"), A1 = c(6197300L, 54415000L, 18671000L,
22473000L, 3922800L, 2137900L, 180210000L, 5053000L, 0L), A2 = c(3701100L,
33892000L, 11169000L, 18095000L, 2734200L, 1423600L, 113860000L,
3231300L, 0L), B1 = c(2496200L, 20523000L, 7502400L, 4378400L,
0L, 714310L, 66351000L, 1821700L, 0L), B2 = c(1124900L, 18487000L,
9858100L, 4413400L, 0L, 2137900L, 80461000L, 0L, 0L)), .Names = c("X",
"A1", "A2", "B1", "B2"), class = "data.frame", row.names = c(NA,
-9L))
new.df <- apply(df, MARGIN = 1, FUN = function(x){
almostDouble <- 1.5
for(i in seq(from = 2, to = length(x), by = 2)){
if(as.numeric(x[i]) > (almostDouble * as.numeric(x[i+1]))){
x[i] <- x[1]
x[i+1] <- "NAN"
}
else if(as.numeric(x[i+1]) > (almostDouble * as.numeric(x[i]))){
x[i+1] <- x[1]
x[i] <- "NAN"
}
else
x[i] <- x[i+1] <- "NAN"
}
return(x)
})
new.df <- t(new.df)
new.df <- as.data.frame(new.df)