在R dataframe列中使用grep进行模式匹配,并根据模式需要转换其他列值

时间:2016-12-11 00:19:56

标签: r

我是R的新手 我有这样的数据框

   Col1                     Col2    col3    col4    col4    col5
    city:Dallas             #N/A    #N/A    #N/A    #N/A    #N/A
    region:richardson       #N/A    #N/A    #N/A    #N/A    #N/A
    1.A.school1             0.0      0.0     0.0     0.0    0.0
    1.B.school2             0.0      0.0     0.0     0.0    0.1
    1.C.school3              n.a    n.a      0.0     n.a    n.a
    4.B.school5              0.0    n.a      0.0     0.0    0.0
    6.A.uni7                 n.a    n.a      0.0     0.0    n.a
    4.D.uni9                 n.a    0.0      0.0     0.0    0.0
    8.A.uni1                 n.a    n.a      0.0     0.0    0.0
    8.b.8                    0.0     0.0     0.0     0.0    n.a
    8.c.univ6                0.6    0.1      0.0     0.0    0.0

我需要从col1中找到匹配的模式并转换col2,col3,col4,col5的所有值乘以1000

例如: 我需要从col1找到模式8.c.univ6并转换0.6。 0.1 0.0 0.0 0.0 * 1000

同样明智的我需要找到更多模式并转换所有值

任何帮助将不胜感激

1 个答案:

答案 0 :(得分:1)

这是实现目标的一种方式。首先,我不知道每列有什么类。所以,我假设所有列都是有特色的。鉴于此,我编写了以下代码。您的数据在此处称为mydf。我将n.a#N/A替换为NA,并将Col2:Col6的类更改为数字。然后,我使用rowwise()处理每一行。对于每一行,如果Col1有8.c.univ6,请使用. * 1000.代表一列。这里 。可以是Col2:Col6中的每一个。所以只要条件为TRUE,我就会将每个列乘以1000。

library(dplyr)

mydf %>%
mutate_at(vars(-Col1),
          funs(as.numeric(gsub(pattern = "n.a|#N/A", replacement = NA, x = .)))) %>%
rowwise %>%
mutate_at(vars(-Col1),
          funs(if(Col1 == "8.c.univ6") {. * 1000} else{.}))

修改

在第二个mutate_at()中,您可以使用if_else()

mydf %>%
mutate_at(vars(-Col1),
          funs(as.numeric(gsub(pattern = "n.a|#N/A", replacement = NA, x = .)))) %>%
rowwise %>%
mutate_at(vars(-Col1),
          funs(if_else(Col1 == "8.c.univ6", . * 1000, .)))


#                Col1  Col2  Col3  Col4  Col5  Col6
#               <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#1        city:Dallas    NA    NA    NA    NA    NA
#2  region:richardson    NA    NA    NA    NA    NA
#3        1.A.school1     0     0     0     0   0.0
#4        1.B.school2     0     0     0     0   0.1
#5        1.C.school3    NA    NA     0    NA    NA
#6        4.B.school5     0    NA     0     0   0.0
#7           6.A.uni7    NA    NA     0     0    NA
#8           4.D.uni9    NA     0     0     0   0.0
#9           8.A.uni1    NA    NA     0     0   0.0
#10             8.b.8     0     0     0     0    NA
#11         8.c.univ6   600   100     0     0   0.0

DATA

mydf <- structure(list(Col1 = c("city:Dallas", "region:richardson", "1.A.school1", 
"1.B.school2", "1.C.school3", "4.B.school5", "6.A.uni7", "4.D.uni9", 
"8.A.uni1", "8.b.8", "8.c.univ6"), Col2 = c("#N/A", "#N/A", "0.0", 
"0.0", "n.a", "0.0", "n.a", "n.a", "n.a", "0.0", "0.6"), Col3 = c("#N/A", 
"#N/A", "0.0", "0.0", "n.a", "n.a", "n.a", "0.0", "n.a", "0.0", 
"0.1"), Col4 = c("#N/A", "#N/A", "0.0", "0.0", "0.0", "0.0", 
"0.0", "0.0", "0.0", "0.0", "0.0"), Col5 = c("#N/A", "#N/A", 
"0.0", "0.0", "n.a", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0"
), Col6 = c("#N/A", "#N/A", "0.0", "0.1", "n.a", "0.0", "n.a", 
"0.0", "0.0", "n.a", "0.0")), .Names = c("Col1", "Col2", "Col3", 
"Col4", "Col5", "Col6"), row.names = c(NA, -11L), class = "data.frame")