从R中的字符串中提取数字/字符并另存为变量

时间:2013-06-26 08:49:32

标签: r

我对R很陌生,我想知道如何提取距离并输入这种类型的字符串:“刚刚完成了与@RunKeeper一起步行0.56英里”。所以我想将“0.56”,“mi”和“walk”存储到三个单独的变量中。我该怎么做?

THX! 的Jeroen。

我试过了:

can.be <- function(object, class="numeric") 
  suppressWarnings(!is.na(as(object, class)))

str.vec <- c(text)

str.vec <- strsplit(str.vec, " ")

strsplit中的错误(str.vec,“”):非字符参数

pos <- sapply(str.vec, function(x) which(sapply(x, can.be)))
[[1]]

0.56    4

[[2]] 命名整数(0)

...     mapply([[,str.vec,pos)     mapply([[,str.vec,pos + 1)     mapply([[,str.vec,pos + 2)

但现在我收到了这个错误:

> mapply(`[[`, str.vec, pos)
Error in .Primitive("[[")(dots[[1L]][[2L]], dots[[2L]][[2L]]) : 
  attempt to select less than one element
> mapply(`[[`, str.vec, pos+1)
Error in pos + 1 : non-numeric argument to binary operator
> mapply(`[[`, str.vec, pos+2)
Error in pos + 2 : non-numeric argument to binary operator

样本数据(文字):

Just completed a 0.56 mi walk with @RunKeeper. Check it out! http://t.co/lCyzzFeSwq #RunKeeper
Just completed a run in 0:00  with @RunKeeper. Check it out! http://t.co/dJB9DBwF4o #RunKeeper
Just completed a 1.83 km run with @RunKeeper. Check it out! http://t.co/f0S2aKnWXz #RunKeeper
Just completed a 6.03 km run - Gettin' it done! http://t.co/uQ7rBn6M #RunKeeper
Just completed a 1.81 mi walk with @RunKeeper. Check it out! http://t.co/R70fvkLDES #RunKeeper

2 个答案:

答案 0 :(得分:2)

如果预期他们按照特定的顺序,那么

can.be <- function(object, class="numeric") 
  suppressWarnings(!is.na(as(object, class)))

str <- strsplit("Just completed a 0.56 mi walk with @RunKeeper", " ")[[1]]

pos <- which(sapply(str, can.be))

> str[pos]
[1] "0.56"
> str[pos+1]
[1] "mi"
> str[pos+2]
[1] "walk"

它需要序列始终相同。 您可以对一系列测量单位进行硬编码(例如mikm等)以将其识别为序列(尽管您可能更有可能有数字然后 mi 。如果字符串中没有其他数字,这个方法应该非常强大。

编辑:

表示矢量:

str.vec <- c("Just completed a 0.56 mi walk with @RunKeeper", "Just completed a 13 mi cycling with @Michele")

str.vec <- strsplit(str.vec, " ")

pos <- sapply(str.vec, function(x) which(sapply(x, can.be)))

> mapply(`[[`, str.vec, pos)
[1] "0.56" "13"  
> mapply(`[[`, str.vec, pos+1)
[1] "mi" "mi"
> mapply(`[[`, str.vec, pos+2)
[1] "walk"    "cycling"

答案 1 :(得分:0)

如果字符串的格式始终相同,则可以使用:

dist<-as.numeric(substr(text,18,21))
unit<-substr(text,22,23)
way<-substr(text,25,28)

但如果没有,它将无法工作,例如,如果数字的长度发生变化(例如,从0.56到12.21)。你必须确保它不会发生!