此问题中使用的数据集是来自ISLR包的“工资”
library(ISLR)
head(Wage)
year age maritl race education region jobclass health
1 2006 18 1. Never Married 1. White 1. < HS Grad 2. Middle Atlantic 1. Industrial 1. <=Good
2 2004 24 1. Never Married 1. White 4. College Grad 2. Middle Atlantic 2. Information 2. >=Very Good
3 2003 45 2. Married 1. White 3. Some College 2. Middle Atlantic 1. Industrial 1. <=Good
health_ins logwage wage
1 2. No 4.318063 75.04315
2 2. No 4.255273 70.47602
3 1. Yes 4.875061 130.98218
第3列到第9列包含不需要的字符(第一个元素),例如1.或2.
如何删除所有提到的列的所有不需要的字符和数字
答案 0 :(得分:1)
变异所有“[1-9]。”
library(dplyr)
temp <- Wage
ans <- temp %>%
mutate_at(3:9, funs(sub("\\d. ", "", .)))
<强>输出强>
head(ans)
year age maritl race education region jobclass health
1 2006 18 Never Married White < HS Grad Middle Atlantic Industrial <=Good
2 2004 24 Never Married White College Grad Middle Atlantic Information >=Very Good
3 2003 45 Married White Some College Middle Atlantic Industrial <=Good
4 2003 43 Married Asian College Grad Middle Atlantic Information >=Very Good
5 2005 50 Divorced White HS Grad Middle Atlantic Information <=Good
6 2008 54 Married White College Grad Middle Atlantic Information >=Very Good
health_ins logwage wage
1 No 4.318063 75.04315
2 No 4.255273 70.47602
3 Yes 4.875061 130.98218
4 Yes 5.041393 154.68529
5 Yes 4.318063 75.04315
6 Yes 4.845098 127.11574