> dc1
V1 V2
1 20140211-0100 |Box
2 20140211-1782 |Office|Ball
3 20140211-1783 |Office
4 20140211-1784 |Office
5 20140221-0756 |Box
6 20140203-0418 |Box
> strsplit(as.character(dc1[,2]),"^\\|")
[[1]]
[1] "" "Box"
[[2]]
[1] "" "Office" "Ball"
[[3]]
[1] "" "Office"
[[4]]
[1] "" "Office"
[[5]]
[1] "" "Box"
[[6]]
[1] "" "Box"
如何从 strsplit
结果中删除空白("")。结果应如下所示:
[[1]] [1] "Box"
[[2]]
[1] "Office" "Ball"
答案 0 :(得分:8)
您可以在列表中查看使用lapply
。我更改了strsplit
的定义以匹配您的预期输出。
dc1 <- read.table(text = 'V1 V2
1 20140211-0100 |Box
2 20140211-1782 |Office|Ball
3 20140211-1783 |Office
4 20140211-1784 |Office
5 20140221-0756 |Box
6 20140203-0418 |Box', header = TRUE)
out <- strsplit(as.character(dc1[,2]),"\\|")
> lapply(out, function(x){x[!x ==""]})
[[1]]
[1] "Box"
[[2]]
[1] "Office" "Ball"
[[3]]
[1] "Office"
[[4]]
[1] "Office"
[[5]]
[1] "Box"
[[6]]
[1] "Box"
答案 1 :(得分:3)
我没有全球解决方案,但对于您的示例,您可以尝试:
strsplit(sub("^\\|", "", as.character(dc1[,2])),"\\|")
在执行拆分之前,它删除了第一个|
(这是正则表达式"^\\|"
所说的),这是""
的原因。
答案 2 :(得分:3)
您可以使用:
library(stringr)
str_extract_all(dc1[,2], "[[:alpha:]]+")
[[1]]
[1] "Box"
[[2]]
[1] "Office" "Ball"
[[3]]
[1] "Office"
[[4]]
[1] "Office"
[[5]]
[1] "Box"
[[6]]
[1] "Box"
答案 3 :(得分:2)
在这种情况下,您只需在"["
sapply
即可删除每个向量的第一个元素
> sapply(strsplit(as.character(dc1[,2]), "\\|"), "[", -1)
# [[1]]
# [1] "Box"
# [[2]]
# [1] "Office" "Ball"
# [[3]]
# [1] "Office"
# [[4]]
# [1] "Office"
# [[5]]
# [1] "Box"
# [[6]]
# [1] "Box"
答案 4 :(得分:2)
另一种方法在取消列出nzchar()
:
strsplit()
out <- unlist(strsplit(as.character(dc1[,2]),"\\|"))
out[nzchar(x=out)] # removes the extraneous "" marks
答案 5 :(得分:0)
library("stringr")
lapply(str_split(dc1$V2, "\\|"), function(x) x[-1])
[[1]]
[1] "Box"
[[2]]
[1] "Office" "Ball"
[[3]]
[1] "Office"
[[4]]
[1] "Office"
[[5]]
[1] "Box"
[[6]]
[1] "Box"