我有一些这样的数据:
vtab = read.table(textConnection("uid=123455,ou=usuarios,ou=gm,dc=intra,dc=planej,dc=gov,dc=de
uid=123456,ou=bsa,dc=plant,dc=gov,dc=de
uid=123457,ou=reg,ou=regfns,dc=sero,dc=gov,dc=de
uid=123458,ou=reg,ou=regbhe,dc=sero,dc=gov,dc=de
uid=123459,ou=sede,ou=regbsa,dc=sero,dc=gov,dc=de
uid=123450,ou=reg,ou=regbhe,dc=sero,dc=gov,dc=de"))
我想分割这些数据。首先将数据分为两组,包括 uid = 数字和 dc = 中的第三个最后描述。像这样:
[,1] [,2]
[1,] "123455" "plant"
[2,] "123456" "planej"
[3,] "123457" "sero"
[4,] "123458" "sero"
[5,] "123459" "sero"
享受任何帮助: - )
答案 0 :(得分:3)
尝试
Col1 <- gsub('uid=(\\d+).*', '\\1', vtab$V1)
Col2 <- gsub('.*dc=(.*)(,dc=.*){2}', '\\1', vtab$V1)
data.frame(Col1, Col2)
# Col1 Col2
#1 123455 planej
#2 123456 plant
#3 123457 sero
#4 123458 sero
#5 123459 sero
#6 123450 sero
答案 1 :(得分:1)
没有正则表达式:
dat <- strsplit(as.character(vtab[,1]), ",", fixed = TRUE)
vapply(dat, function(x) {
uid <- gsub("uid=", "", x[[1]], fixed = TRUE)
dc <- grep("dc", x, value = TRUE)
dc <- dc[length(dc) - 2]
dc <- gsub("dc=", "", dc, fixed = TRUE)
c(uid, dc)
}, c("a", "a"))
# [,1] [,2] [,3] [,4] [,5] [,6]
#[1,] "123455" "123456" "123457" "123458" "123459" "123450"
#[2,] "planej" "plant" "sero" "sero" "sero" "sero"
答案 2 :(得分:0)
使用gsub。像下面的东西。使用readLines读取数据。希望它有所帮助!
x = readLines(textConnection("uid=123455,ou=usuarios,ou=gm,dc=intra,dc=planej,
... ,dc=de" ))
## Create a dataframe XX
##1. UID
XX <- as.data.frame (gsub("\\D","",x) )
colnames(XX) <- c('uid')
XX
uid
1 123455
2 123456
3 123457
4 123458
5 123459