用r中的现有因子代替列中的值

时间:2017-09-22 21:36:13

标签: r

我有一个数据集,其中'workclass'列具有以下值:

enter image description here

现在在我看来,'privat'的价值实际上与'Private'相同,所以我想相应地改变它

如果我运行以下命令,则会收到错误消息,因为未定义因子。

    > adult$workclass[adult$workclass == 'privat'] <- 'Private'
    Warning message:
    In `[<-.factor`(`*tmp*`, adult$workclass == "privat", value = c(7L,  :
    invalid factor level, NA generated

如果我对该列进行“解构”并在操作后再次“重构”该列,我最终将为“私有”设置两个不同的因素。

    > adult$workclass <- as.character(adult$workclass)
    > adult$workclass[adult$workclass=='privat']  <- 'Private'
    > adult$workclass <- as.factor(adult$workclass)
    > summary(adult$workclass)
          Federal-gov         Local-gov      Never-worked           Private 
                  960              2093                 7             22686 
         Self-emp-inc  Self-emp-not-inc         State-gov       Without-pay 
                 1116              2541              1298                14 
              Private              NA's 
                   10              1836

如何合并'privat'和'Private'?

2 个答案:

答案 0 :(得分:0)

levels(adult$workclass)的输出是什么?看起来您的“私人”级别与字符串“私有”不完全相同。

当我运行以下代码时,我得到了所需的结果:

f <- data.frame(f=factor(c(
  rep("Federal-gov", 960),
  rep("Local-gov", 2093),
  rep("Never-worked", 7),
  rep("Private", 22686),
  rep("Self-emp-inc", 1116),
  rep("Self-emp-not-inc", 2541),
  rep("State-gov", 1298),
  rep("Without-pay", 14), 
  rep("privat", 10),
  rep("NA's", 1836)
)))

f$f[f$f=="privat"] <- "Private"
f <- droplevels(f)
table(f)
Federal-gov        Local-gov             NA's     Never-worked 
         960             2093             1836                7 
Private     Self-emp-inc Self-emp-not-inc        State-gov 
       22696             1116             2541             1298 
Without-pay 
          14 

答案 1 :(得分:0)

您可以尝试:

library(dplyr)
adult %>%
  mutate(workclass = recode_factor(workclass, privat = "Private"))