我对R很陌生,我想知道如何提取距离并输入这种类型的字符串:“刚刚完成了与@RunKeeper一起步行0.56英里”。所以我想将“0.56”,“mi”和“walk”存储到三个单独的变量中。我该怎么做?
THX! 的Jeroen。
我试过了:
can.be <- function(object, class="numeric")
suppressWarnings(!is.na(as(object, class)))
str.vec <- c(text)
str.vec <- strsplit(str.vec, " ")
strsplit中的错误(str.vec,“”):非字符参数
pos <- sapply(str.vec, function(x) which(sapply(x, can.be)))
[[1]]
0.56 4
[[2]] 命名整数(0)
...
mapply([[
,str.vec,pos)
mapply([[
,str.vec,pos + 1)
mapply([[
,str.vec,pos + 2)
但现在我收到了这个错误:
> mapply(`[[`, str.vec, pos)
Error in .Primitive("[[")(dots[[1L]][[2L]], dots[[2L]][[2L]]) :
attempt to select less than one element
> mapply(`[[`, str.vec, pos+1)
Error in pos + 1 : non-numeric argument to binary operator
> mapply(`[[`, str.vec, pos+2)
Error in pos + 2 : non-numeric argument to binary operator
样本数据(文字):
Just completed a 0.56 mi walk with @RunKeeper. Check it out! http://t.co/lCyzzFeSwq #RunKeeper
Just completed a run in 0:00 with @RunKeeper. Check it out! http://t.co/dJB9DBwF4o #RunKeeper
Just completed a 1.83 km run with @RunKeeper. Check it out! http://t.co/f0S2aKnWXz #RunKeeper
Just completed a 6.03 km run - Gettin' it done! http://t.co/uQ7rBn6M #RunKeeper
Just completed a 1.81 mi walk with @RunKeeper. Check it out! http://t.co/R70fvkLDES #RunKeeper
答案 0 :(得分:2)
如果预期他们按照特定的顺序,那么
can.be <- function(object, class="numeric")
suppressWarnings(!is.na(as(object, class)))
str <- strsplit("Just completed a 0.56 mi walk with @RunKeeper", " ")[[1]]
pos <- which(sapply(str, can.be))
> str[pos]
[1] "0.56"
> str[pos+1]
[1] "mi"
> str[pos+2]
[1] "walk"
它需要序列始终相同。 但您可以对一系列测量单位进行硬编码(例如mi
,km
等)以将其识别为序列(尽管您可能更有可能有数字然后 mi 。如果字符串中没有其他数字,这个方法应该非常强大。
编辑:
表示矢量:
str.vec <- c("Just completed a 0.56 mi walk with @RunKeeper", "Just completed a 13 mi cycling with @Michele")
str.vec <- strsplit(str.vec, " ")
pos <- sapply(str.vec, function(x) which(sapply(x, can.be)))
> mapply(`[[`, str.vec, pos)
[1] "0.56" "13"
> mapply(`[[`, str.vec, pos+1)
[1] "mi" "mi"
> mapply(`[[`, str.vec, pos+2)
[1] "walk" "cycling"
答案 1 :(得分:0)
如果字符串的格式始终相同,则可以使用:
dist<-as.numeric(substr(text,18,21))
unit<-substr(text,22,23)
way<-substr(text,25,28)
但如果没有,它将无法工作,例如,如果数字的长度发生变化(例如,从0.56到12.21)。你必须确保它不会发生!