从字符串中提取包含逗号的数字,并将其转换为数字数组

时间:2013-10-17 18:57:05

标签: regex arrays r

我试图从句子中提取数字,然后将数字放在一起作为数字数组。例如,

  string<-"  The Team:  $74,810 TOTAL RAISED SO FARJOIN THE TEAM Vik Muniz 
             Amount Raised: $70,560   71% Raised of $100,000 Goal CDI International,
             Inc.  Amount Raised: $2,070  Robert Goodwin Amount Raised: $1,500 
             30% Raised of $5,000 Goal Marcel Fukayama Amount Raised: 
             $210  Maitê Proença Amount Raised: $140  
             Thiago Nascimento Amount Raised: $120  
             Lydia Kroeger Amount Raised: $80  "          

为了继续,我首先删除了逗号,以便我可以轻松地提取数字:

    string.nocomma <- gsub(',', '', string)

然后我尝试将数字放在一起作为数字向量:

    fund.numbers <-unique(as.numeric(gsub("[^0-9]"," ",string.nocomma),""))       

以下是问题:

  1. R在最后一个命令后抛出错误。错误如下:

    Warning message:
    In unique(as.numeric(gsub("[^0-9]", " ", website.fund.nocomma),  :
    NAs introduced by coercion
    
  2. 即使我修复了上述错误并使用了数字向量,我也不确定如何将数字向量转换为数值数组。

    有人能帮助我吗? 谢谢,

2 个答案:

答案 0 :(得分:2)

你可以这样做:

## Extract all numbers and commas
numbers <- unlist(regmatches(string, gregexpr("[0-9,]+", string)))
## Delete commas
numbers <- gsub(",", "", numbers)
## Delete empty strings (when only one comma has been extracted)
numbers <- numbers[numbers != ""]
numbers

# [1] "74810"  "70560"  "71"     "100000" "2070"   "1500"   "30"    
# [8] "5000"   "210"    "140"    "120"    "80"

答案 1 :(得分:1)

应用gsub()之后,你有一个包含数字和空格的字符串,因此无法直接将其转换为数字。你需要什么数字矢量。我认为最好使用gregexpr来获取它:

## get list of string with numbers only
> res = regmatches(string.nocomma, gregexpr("([0-9]+)", string.nocomma))
## convert it to numeric
> res = as.numeric(unlist(res))

 [1]  74810  70560     71 100000   2070   1500     30   5000    210    140    120
[12]     80