按最后的模式拆分字符串

时间:2015-02-27 17:52:27

标签: r string design-patterns

我有一些这样的数据:

vtab = read.table(textConnection("uid=123455,ou=usuarios,ou=gm,dc=intra,dc=planej,dc=gov,dc=de  
                                 uid=123456,ou=bsa,dc=plant,dc=gov,dc=de  
                                 uid=123457,ou=reg,ou=regfns,dc=sero,dc=gov,dc=de  
                                 uid=123458,ou=reg,ou=regbhe,dc=sero,dc=gov,dc=de    
                                 uid=123459,ou=sede,ou=regbsa,dc=sero,dc=gov,dc=de    
                                 uid=123450,ou=reg,ou=regbhe,dc=sero,dc=gov,dc=de"))   

我想分割这些数据。首先将数据分为两组,包括 uid = 数字和 dc = 中的第三个最后描述。像这样:

     [,1]         [,2]      
[1,] "123455"   "plant" 
[2,] "123456"   "planej" 
[3,] "123457"   "sero" 
[4,] "123458"   "sero" 
[5,] "123459"   "sero" 

享受任何帮助: - )

3 个答案:

答案 0 :(得分:3)

尝试

Col1 <- gsub('uid=(\\d+).*', '\\1', vtab$V1)
Col2 <- gsub('.*dc=(.*)(,dc=.*){2}', '\\1', vtab$V1)
data.frame(Col1, Col2)
#     Col1   Col2
#1 123455 planej
#2 123456  plant
#3 123457   sero
#4 123458   sero
#5 123459   sero
#6 123450   sero

答案 1 :(得分:1)

没有正则表达式:

dat <- strsplit(as.character(vtab[,1]), ",", fixed = TRUE)
vapply(dat, function(x) {
  uid <- gsub("uid=", "", x[[1]], fixed = TRUE)
  dc <- grep("dc", x, value = TRUE)
  dc <- dc[length(dc) - 2]
  dc <- gsub("dc=", "", dc, fixed = TRUE)
  c(uid, dc)
}, c("a", "a"))

#     [,1]     [,2]     [,3]     [,4]     [,5]     [,6]    
#[1,] "123455" "123456" "123457" "123458" "123459" "123450"
#[2,] "planej" "plant"  "sero"   "sero"   "sero"   "sero" 

答案 2 :(得分:0)

使用gsub。像下面的东西。使用readLines读取数据。希望它有所帮助!

    x =   readLines(textConnection("uid=123455,ou=usuarios,ou=gm,dc=intra,dc=planej,
    ... ,dc=de" ))

    ## Create a dataframe XX
    ##1. UID
    XX <- as.data.frame (gsub("\\D","",x) )
    colnames(XX) <- c('uid')
    XX
    uid
    1 123455
    2 123456
    3 123457
    4 123458
    5 123459