我试图从句子中提取数字,然后将数字放在一起作为数字数组。例如,
string<-" The Team: $74,810 TOTAL RAISED SO FARJOIN THE TEAM Vik Muniz
Amount Raised: $70,560 71% Raised of $100,000 Goal CDI International,
Inc. Amount Raised: $2,070 Robert Goodwin Amount Raised: $1,500
30% Raised of $5,000 Goal Marcel Fukayama Amount Raised:
$210 Maitê Proença Amount Raised: $140
Thiago Nascimento Amount Raised: $120
Lydia Kroeger Amount Raised: $80 "
为了继续,我首先删除了逗号,以便我可以轻松地提取数字:
string.nocomma <- gsub(',', '', string)
然后我尝试将数字放在一起作为数字向量:
fund.numbers <-unique(as.numeric(gsub("[^0-9]"," ",string.nocomma),""))
以下是问题:
R在最后一个命令后抛出错误。错误如下:
Warning message:
In unique(as.numeric(gsub("[^0-9]", " ", website.fund.nocomma), :
NAs introduced by coercion
即使我修复了上述错误并使用了数字向量,我也不确定如何将数字向量转换为数值数组。
有人能帮助我吗? 谢谢,
答案 0 :(得分:2)
你可以这样做:
## Extract all numbers and commas
numbers <- unlist(regmatches(string, gregexpr("[0-9,]+", string)))
## Delete commas
numbers <- gsub(",", "", numbers)
## Delete empty strings (when only one comma has been extracted)
numbers <- numbers[numbers != ""]
numbers
# [1] "74810" "70560" "71" "100000" "2070" "1500" "30"
# [8] "5000" "210" "140" "120" "80"
答案 1 :(得分:1)
应用gsub()之后,你有一个包含数字和空格的字符串,因此无法直接将其转换为数字。你需要什么数字矢量。我认为最好使用gregexpr
来获取它:
## get list of string with numbers only
> res = regmatches(string.nocomma, gregexpr("([0-9]+)", string.nocomma))
## convert it to numeric
> res = as.numeric(unlist(res))
[1] 74810 70560 71 100000 2070 1500 30 5000 210 140 120
[12] 80