Question

我正在尝试折叠因子水平，最初，count(a7_edu2)输出显示折叠已经成功，但是当我检查结构并在RStudio视图中查看时，更改不会影响实际变量

关于保存为新变量或覆盖旧变量的任何建议？谢谢！

我使用fct_collapse分解为三类，并尝试mutate()创建具有新级别的新变量。我试过保存到一个新变量中，并且也将其保存为transmute（）而不是mutate（）。我会对新变量或替换旧变量感到满意。

  mutate(a7_edu2 = fct_collapse(a7_edu2,
    Highschool = c("Elm School", "Grade 7 or 8", "Grade 9 to 11", "High School Diploma", "G.E.D"),
    Diploma = c("Diploma or Certificate from trade tech school" , "Diploma or Certificate from community college or CEGEP"),
    Bachelors = c("Bachelor degree", "Degree (Medicine, Dentistry etc)", "Masters degree", "Doctorate")
  )) %>%
  count(a7_edu2) # this is the result I want but when i check the structure, it doesn't save!


str(SCI_dem$a7_edu2)

我希望输出结果为“具有4个级别的因素：“高中”，“文凭”，“学士”，“其他” 但取而代之的却是原始的“具有13个级别的因子”榆木学校”，“ 7或8年级”，..：8 7 6 10 7 7 8 3 7 10 ...”

更新的问题：它可以将一个变量保存到新的df（SCI_collpase）中。但是，当我尝试将其他新的折叠变量保存到同一数据帧时，它会覆盖以前的折叠 ...我尝试指定新列SCI_collapse$edu，但随后重命名了新的折叠变量df ... 如何折叠多个变量并将每个变量添加到新的df中？ 有关保存或编写管道的建议？

SCI_collapse <- SCI_dem %>% 
  mutate(a7_edu2 = fct_collapse(a7_edu2, 
                                Highschool = c("Elm School", 
                                                        "Grade 7 or 8", 
                                                        "Grade 9 to 11", 
                                                        "High School Diploma", 
                                                        "G.E.D"), 
                                Diploma = c("Diploma or Certificate from trade tech school" , 
                                            "Diploma or Certificate from community college or CEGEP"), 
                                Bachelors = c("Bachelor degree", 
                                              "Degree (Medicine, Dentistry etc)", 
                                              "Masters degree", "Doctorate")))

Answer 1

这就是我最终要做的：

# Collapse levels (education) SCI_dem <- SCI_dem %>% mutate(a7_edu2_col = fct_collapse(a7_edu2, # Save as new variable ending in _col Highschool= c("Elm School", "Grade 7 or 8", "Grade 9 to 11", "High School Diploma", "G.E.D"), Diploma = c("Diploma or Certificate from trade tech school" , "Diploma or Certificate from community college or CEGEP"), Bachelors= c("Bachelor degree", "Degree (Medicine, Dentistry etc)", "Masters degree", "Doctorate"), Other = c("Other", "Prefer not to answer") ), a7_edu2_col = droplevels(a7_edu2_col)) %>% # drop empty levels of _col rename(a7_edu2_unc = a7_edu2)

我现在有 new 变量以_col结尾，并且已将 old 变量重命名为以_unc结尾（未折叠）。然后，我通过删除以_unc结尾的列来进行清理。

SCI_dem <- select(SCI_dem, -ends_with("_unc"))

哪一个让我保持整洁，折叠的数据框：）

如何将折叠的因子水平（用于多个变量）保存到新的数据框中？

1 个答案: