以下是示例字符串:
/site/50?ret=html&limit=8&phint=eid%3D283&phint=tcat%3D53159&phint=bin%3D1.99&phint=iid%3D301468384280&phint=type%3Duser&phint=pid%3D&phint=meta%3D11450&phint=gid%3D2&phint=inid%3D3&phint=tps%3D&phint=crm%3D3&phint=css%3D6&phint=cg%3D50a8abe714b0a7e37480bbe0fe9fe01e
基本上我需要做两级分裂。一个是拆分并取出我所做的两个“& phint =”之间的所有字符串。现在我的输出是:
[[1]]
[1] "/site/50?ret=html&limit=8"
[2] "eid%3D283"
[3] "tcat%3D53159"
[4] "bin%3D1.99"
[5] "iid%3D301468384280"
[[3]]
[1] "/site/17001?ret=html&limit=8" "eid%3D278" "tcat%3D26395" "bin%3D0.0"
[5] "iid%3D0" "type%3Duser" "pid%3D" "meta%3D26395"
[9] "gid%3D1" "inid%3D5" "tps%3D" "crm%3D6"
[13] "css%3D10"
这是一个清单。
现在我需要在找到%3D 时拆分,我需要将其拆分为两个:
例:
"eid%3D283"
应写入两个单独的数据框列:
eid in one column
283 into other column
Dis应该做到“n”否。 1列矩阵中的列数。在第一级拆分后,这成为一个列矩阵。
Expected output:
Key Value
eid 283
tcat 53159
bin 1.99
and so on..
感谢任何帮助。
谢谢, Pravellika J
答案 0 :(得分:3)
你可以尝试
res <- do.call(rbind.data.frame,
lapply(strsplit(as.character(dat1$Col), '&phint='), function(x)
do.call(rbind,lapply(strsplit(x[-1], '%3D'), function(y)
if(length(y)<2) rep(NA,2) else y))))
colnames(res) <- c('Key', 'Value')
head(res,2)
# Key Value
#1 eid 283
#2 tcat 53159
根据dput
输出,数据集包含
se32%3DD%3Dc31
因此,最好有两列以上来容纳这些情况,
lst1 <- lapply(strsplit(as.character(dat1$Col), '&phint='),
function(x) strsplit(x[-1], '%3D'))
lMax <- max(rapply(lst1, length))
res <- do.call(rbind.data.frame,lapply(lst1, function(x)
do.call(rbind,lapply(x, `length<-`, lMax))))
head(res,3)
# V1 V2 V3
#1 eid 283 <NA>
#2 tcat 53159 <NA>
#3 bin 1.99 <NA>
如果我们需要在strsplit
lst1 <- lapply(strsplit(as.character(dat1$Col), '&phint='),function(x) {
x1 <- as.numeric(sub("^.*/site/([0-9]+).*", "\\1",x[1]))
x2 <- strsplit(x[-1], '%3D')
c(x1,x2)})
lMax <- max(rapply(lst1, length))
res <- do.call(rbind,lapply(lst1, function(x)
setNames(data.frame(x[1],do.call(rbind,lapply(x[-1], `length<-`,
lMax))), paste0('V', seq_len(lMax+1)))))
head(res,3)
# V1 V2 V3 V4
#1 50 eid 283 <NA>
#2 50 tcat 53159 <NA>
#3 50 bin 1.99 <NA>