我有一个大型数据集,我想在其中识别和删除字符和符号以仅保留数字值。 *例如我希望 -£1125.91m
为 -1125.91
dataset
Event var1 var2
<fct> <chr> <chr>
1 Labour Costs YoY 13.34m 0.026
2 Unemployment Change (000's) $16.91b -0.449
3 Unemployment Rate -£1125.91m 0.89k
4 Jobseekers Net Change ¥1012.74b 9.56m
目前我知道如何从列中删除单个字符。像这样:
dataset$`var1` <- gsub("k", "", dataset$`var`)
手动执行此操作将需要大量工作,因为数据集非常大。 我想知道您是否可以同时识别和删除所有字符,以及货币符号和 m 和 b 吗?
复制数据集:
dataset <- structure(list(Event = structure(2:5, .Label = c("Event", "Labour Costs YoY",
"Unemployment Change (000's)", "Unemployment Rate", "Jobseekers Net Change"),
.Names = c("", "", "", ""), class = "factor"), var1 = c("13.34m", "$16.91b", "-£1125.91m", "¥1012.74b"), var2 = c(0.026, -0.449, "0.89k", "9.56m")), row.names = c(NA,
-4L), class = c("tbl_df", "tbl", "data.frame"))
先谢谢你!
答案 0 :(得分:1)
要删除除连字符、数字或点以外的所有内容,您可以使用
dataset$var1 <- gsub("[^-0-9.]", "", dataset$var1)
[^-0-9.]
模式是一个否定字符类,它匹配除类中定义的字符之外的任何字符。
dataset <- structure(list(Event = structure(2:5, .Label = c("Event", "Labour Costs YoY",
"Unemployment Change (000's)", "Unemployment Rate", "Jobseekers Net Change"),
.Names = c("", "", "", ""), class = "factor"), var1 = c("13.34m", "$16.91b", "-£1125.91m", "¥1012.74b"), var2 = c(0.026, -0.449, "0.89k", "9.56m")), row.names = c(NA,
-4L), class = c("tbl_df", "tbl", "data.frame"))
gsub("[^-0-9.,]", "", dataset$var1)
## => [1] "13.34" "16.91" "-1125.91" "1012.74"