我有一个看起来像这样的数据框:
DF$Lst
[1] "Some text > in > a string"
"Another > text in > another > set of string"
"This is only > One text"
"NA"
..... so forth
如果您注意到这一点,则每行都有一个由'>'
我想创建' TWO'新列应该只有第一个字符串和最后一个字符串,例如:
Text Col1 Col2
Some text > in > a string Some text a string
Another > text in > another > set of string Another set of string
我正在尝试使用函数:
substrRight <- function(x, n){
substr(x, nchar(x)-n+1, nchar(x))
}
substrRight(x, 6)
但我认为这不是正确的做法。因为上述功能没有帮助。我们能有更好的解决问题吗?
答案 0 :(得分:2)
我们可以使用extract
tidyr
library(tidyverse)
DF %>%
extract(Text, into = c('Col1', 'Col2'), '^([^>]+) >.* > ([^>]+)$',
remove = FALSE)
# Text Col1 Col2
#1 Some text > in > a string Some text a string
#2 Another > text in > another > set of string Another set of string
或base R
上的split
,>
,然后获取第一个和最后一个元素
DF[c('Col1', 'Col2')] <- t(sapply(strsplit(DF$Text, " > "),
function(x) c(x[1], x[length(x)])))
在更新的数据集&#39; DF3&#39;中,NAs
是字符串。我们可以将其转换为真正的NAs
is.na(DF3$Text) <- DF3$Text == "NA"
DF3[c('Col1', 'Col2')] <- t(sapply(strsplit(DF3$Text, " > "),
function(x) c(x[1], x[length(x)])))
DF3
# Text Col1 Col2
#1 Some text > in > a string Some text a string
#2 Another > text in > another > set of string Another set of string
#3 This > is one This is one
#4 <NA> <NA> <NA>
或类似于@ Onyambu的模式
DF3 %>%
extract(Text, into = c("Col1", "Col2"),
"^([^>]*)>(?:.*>)?([^>]*)$", remove = FALSE)
# Text Col1 Col2
#1 Some text > in > a string Some text a string
#2 Another > text in > another > set of string Another set of string
#3 This > is one This is one
#4 <NA> <NA> <NA>
DF <- structure(list(Text = c("Some text > in > a string",
"Another > text in > another > set of string"
)), .Names = "Text", row.names = c(NA, -2L), class = "data.frame")
DF3 <- structure(list(Text = c("Some text > in > a string",
"Another > text in > another > set of string", "This > is one", "NA")),
.Names = "Text", row.names = c(NA, -4L), class = "data.frame")
答案 1 :(得分:2)
Base R版本:
text=DF$Lst# Will assume this is given
read.table(text=sub(">.*>",">",text),sep=">")
V1 V2
1 Some text a string
2 Another set of string
cbind(text,read.table(text=sub(">.*>",">",text),sep=">"))
text V1 V2
1 Some text > in > a string Some text a string
2 Another > text in > another > set of string Another set of string
另一种基础R方法:
data.frame(do.call(rbind,regmatches(text,regexec("(.*)>.*>(.*)",text))))
X1 X2 X3
1 Some text > in > a string Some text a string
2 Another > text in > another > set of string Another > text in set of string
read.table(text=sub("(^.*?)>(?:.*>)*(.*$)","\\1>\\2",text),sep=">",fill = T,na.strings = "")
V1 V2
1 Some text a string
2 Another set of string
3 This is only One text
4 NA <NA>
或者你可以这样做:
read.table(text=sub("(^[^>]*).*?([^>]*$)","\\1>\\2",text),sep=">",fill = T,na.strings = "")
V1 V2
1 Some text a string
2 Another set of string
3 This is only One text
4 <NA> NA
使用separate
separate(data.frame(text),text,c("col1","col2"),"((?:>.*)>|>)",fill="right" )
col1 col2
1 Some text a string
2 Another set of string
3 This is only One text
4 NA <NA>