如何从所有列中删除数字或文本元素

时间:2017-08-27 10:10:16

标签: r tidyr data-cleaning

此问题中使用的数据集是来自ISLR包的“工资”

    library(ISLR)

    head(Wage)

   year age           maritl     race       education             region       jobclass         health
1 2006  18 1. Never Married 1. White    1. < HS Grad 2. Middle Atlantic  1. Industrial      1. <=Good
2 2004  24 1. Never Married 1. White 4. College Grad 2. Middle Atlantic 2. Information 2. >=Very Good
3 2003  45       2. Married 1. White 3. Some College 2. Middle Atlantic  1. Industrial      1. <=Good
  health_ins  logwage      wage
1      2. No 4.318063  75.04315
2      2. No 4.255273  70.47602
3     1. Yes 4.875061 130.98218

第3列到第9列包含不需要的字符(第一个元素),例如1.或2.

如何删除所有提到的列的所有不需要的字符和数字

1 个答案:

答案 0 :(得分:1)

变异所有“[1-9]。”

library(dplyr)
temp <- Wage
ans <- temp %>% 
         mutate_at(3:9, funs(sub("\\d. ", "", .)))

<强>输出

head(ans)

  year age        maritl  race    education          region    jobclass      health
1 2006  18 Never Married White    < HS Grad Middle Atlantic  Industrial      <=Good
2 2004  24 Never Married White College Grad Middle Atlantic Information >=Very Good
3 2003  45       Married White Some College Middle Atlantic  Industrial      <=Good
4 2003  43       Married Asian College Grad Middle Atlantic Information >=Very Good
5 2005  50      Divorced White      HS Grad Middle Atlantic Information      <=Good
6 2008  54       Married White College Grad Middle Atlantic Information >=Very Good
  health_ins  logwage      wage
1         No 4.318063  75.04315
2         No 4.255273  70.47602
3        Yes 4.875061 130.98218
4        Yes 5.041393 154.68529
5        Yes 4.318063  75.04315
6        Yes 4.845098 127.11574