如何在数据框内设置字符串

时间:2016-11-17 09:47:40

标签: r

我的数据包含 n 列数。第一列是字符串,其他列是值。

例如,可以在下面找到一个df

df <- structure(list(X = structure(c(1L, 6L, 8L, 9L, 4L, 7L, 2L, 3L, 
5L), .Label = c("Ajngs ", "HAUDD;HHEYDG", "hdgdhdgh", "hdgduk;ldodjg", 
"hdhzd;hftfgd", "Kuksjgd", "sjsggd;pfofjdg", "Tmlsks", "yhfkfu"
), class = "factor"), A1 = c(6197300L, 54415000L, 18671000L, 
22473000L, 3922800L, 2137900L, 180210000L, 5053000L, 0L), A2 = c(3701100L, 
33892000L, 11169000L, 18095000L, 2734200L, 1423600L, 113860000L, 
3231300L, 0L), B1 = c(2496200L, 20523000L, 7502400L, 4378400L, 
0L, 714310L, 66351000L, 1821700L, 0L), B2 = c(1124900L, 18487000L, 
9858100L, 4413400L, 0L, 2137900L, 80461000L, 0L, 0L)), .Names = c("X", 
"A1", "A2", "B1", "B2"), class = "data.frame", row.names = c(NA, 
-9L))

我想做的是:每两列都在一起(在这个例子中A1和A2在一起,B1和B2在一起)

所以,我想检查A1是否几乎是两倍(意味着50%一个值大于另一个)A2粘贴相应的字符串而不是值,如果A2是A1的两倍粘贴相应的字符串而不是值。如果它们不比另一个大两倍,则将其设置为NAN

作为一个例子

          A1        A2
Ajngs   6197300   3701100

A1几乎是A2的两倍,因此输出应该如下所示

        A1        A2
Ajngs  Ajngs     NAN

预期输出

          A1    A2     B1       B2
Ajngs   Ajngs   NAN   Ajngs     NAN
Kuksjgd Kuksjgd NAN    NAN      NAN
Tmlsks    NAN   NAN    NAN      NAN
yhfkfu    NAN   NAN    NAN      NAN
hdgduk;ldodjg   NAN NAN NAN NAN
sjsggd;pfofjdg  sjsggd;pfofjdg  NAN NAN sjsggd;pfofjdg
HAUDD;HHEYDG    HAUDD;HHEYDG    NAN NAN NAN
hdgdhdgh    NAN  NAN    hdgdhdgh    NAN
hdhzd;hftfgd    NAN NAN NAN NAN

1 个答案:

答案 0 :(得分:1)

请尝试以下更正后的代码。只需根据自己的喜好更改almostDouble的值即可。这接受4列以上的数据帧。

df <- structure(list(X = structure(c(1L, 6L, 8L, 9L, 4L, 7L, 2L, 3L, 
5L), .Label = c("Ajngs ", "HAUDD;HHEYDG", "hdgdhdgh", "hdgduk;ldodjg", 
"hdhzd;hftfgd", "Kuksjgd", "sjsggd;pfofjdg", "Tmlsks", "yhfkfu"
), class = "factor"), A1 = c(6197300L, 54415000L, 18671000L, 
22473000L, 3922800L, 2137900L, 180210000L, 5053000L, 0L), A2 = c(3701100L, 
33892000L, 11169000L, 18095000L, 2734200L, 1423600L, 113860000L, 
3231300L, 0L), B1 = c(2496200L, 20523000L, 7502400L, 4378400L, 
0L, 714310L, 66351000L, 1821700L, 0L), B2 = c(1124900L, 18487000L, 
9858100L, 4413400L, 0L, 2137900L, 80461000L, 0L, 0L)), .Names = c("X", 
"A1", "A2", "B1", "B2"), class = "data.frame", row.names = c(NA, 
-9L))

new.df <- apply(df, MARGIN = 1, FUN = function(x){
    almostDouble <- 1.5
    for(i in seq(from = 2, to = length(x), by = 2)){
        if(as.numeric(x[i]) > (almostDouble * as.numeric(x[i+1]))){
            x[i] <- x[1]
            x[i+1] <- "NAN"
        }
        else if(as.numeric(x[i+1]) > (almostDouble * as.numeric(x[i]))){
            x[i+1] <- x[1]
            x[i] <- "NAN"
        }
        else
            x[i] <- x[i+1] <- "NAN"
    }
    return(x)
})

new.df <- t(new.df)
new.df <- as.data.frame(new.df)