Question

我试图从句子中提取数字，然后将数字放在一起作为数字数组。例如，

  string<-"  The Team:  $74,810 TOTAL RAISED SO FARJOIN THE TEAM Vik Muniz 
             Amount Raised: $70,560   71% Raised of $100,000 Goal CDI International,
             Inc.  Amount Raised: $2,070  Robert Goodwin Amount Raised: $1,500 
             30% Raised of $5,000 Goal Marcel Fukayama Amount Raised: 
             $210  Maitê Proença Amount Raised: $140  
             Thiago Nascimento Amount Raised: $120  
             Lydia Kroeger Amount Raised: $80  "

为了继续，我首先删除了逗号，以便我可以轻松地提取数字：

    string.nocomma <- gsub(',', '', string)

然后我尝试将数字放在一起作为数字向量：

    fund.numbers <-unique(as.numeric(gsub("[^0-9]"," ",string.nocomma),""))

以下是问题：

R在最后一个命令后抛出错误。错误如下：

Warning message:
In unique(as.numeric(gsub("[^0-9]", " ", website.fund.nocomma),  :
NAs introduced by coercion

即使我修复了上述错误并使用了数字向量，我也不确定如何将数字向量转换为数值数组。

有人能帮助我吗？谢谢，

Answer 1

你可以这样做：

## Extract all numbers and commas
numbers <- unlist(regmatches(string, gregexpr("[0-9,]+", string)))
## Delete commas
numbers <- gsub(",", "", numbers)
## Delete empty strings (when only one comma has been extracted)
numbers <- numbers[numbers != ""]
numbers

# [1] "74810"  "70560"  "71"     "100000" "2070"   "1500"   "30"    
# [8] "5000"   "210"    "140"    "120"    "80"

Answer 2

应用gsub（）之后，你有一个包含数字和空格的字符串，因此无法直接将其转换为数字。你需要什么数字矢量。我认为最好使用gregexpr来获取它：

## get list of string with numbers only
> res = regmatches(string.nocomma, gregexpr("([0-9]+)", string.nocomma))
## convert it to numeric
> res = as.numeric(unlist(res))

 [1]  74810  70560     71 100000   2070   1500     30   5000    210    140    120
[12]     80

从字符串中提取包含逗号的数字，并将其转换为数字数组

2 个答案: