Question

我有一个包含许多行业名称的因素。我需要将它们分解成主要的类别和行业。例如，因为我允许受访者以他们想要的任何方式做出回应，所以我的数量级别有所膨胀（例如金融服务，金融服务，银行，金融）。因为这些情况不匹配，所以它们会作为一个额外的级别出现，所以我试图用forcats来崩溃它们：

test <- fct_collapse(PrescreenF$Industry, Finance = c("Banking",
  "Corporate Finance", "Finance", "Financial", "financial services",
  "financial services", "Financial Services", "Financial services"),
  NULL = "H")

我收到一条警告说：＆＃34;金融服务＆＃34;不明。这非常令人沮丧，因为当我调用向量时，我可以看到它确实存在。我已经尝试复制并粘贴来自通话的确切单词，重新编写它，似乎有隐藏的字符可以防止它被更改。

如何正确折叠这些值？

-> test$industry
Banking
Corporate Finance 
Finance Financial 
financial services
financial services 
Financial Services 
Financial services

当我去＆＃34;重新估价＆＃34;比如，最后一级，＆＃34;金融服务＆＃34;，它告诉我它是一个未知的字符串。

修改输出的输出（x $行业）

structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 
4L, 3L, 3L, 3L, 5L, 7L, 8L, 9L, 10L, 11L, 12L, 12L, 13L, 14L, 
15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 16L, 16L, 16L, 16L, 
16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 
16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 17L, 18L, 18L, 18L, 
18L, 19L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 25L, 26L, 27L, 28L
), .Label = c("", "{\"ImportId\":\"QID8_TEXT\"}", "Finance", 
"Financial ", "Financial services ", "Please indicate the industry you work in (e.g. technology, healthcare etc):", 
"Cleantech", "Delivery", "e-commerce/fashion", "Food", "Food & Bev", 
"Retail", "Service", "tech", "technology", "Technology", "IT, technology", 
"Software", "Technology ", "Tehcnology", "Consulting", "Digital advertising", 
"Education", "Higher education", "Technology, management consulting", 
"University professor; teaching, research and service", "Information Technology and Services", 
"mobile technology"), class = "factor")

EDIT 弄清楚了。一些条款在结束后有一个额外的空间。例如，虽然当我打电话给Prescreen $ Industry时，它会返回许多名称，例如＆＃34; Banking＆＃34;而且＆＃34;企业融资＆＃34;，它并没有告诉我在关卡之后还有一个空间。银行业实际上是......＆＃34;银行业＆＃34;有一个看不见的空间，没有出现在R中。如何确保这是可见的并且不会再发生？

我可以在列中运行len函数吗？如果是这样，那怎么办？ PrescreenF $工业（＆＃34;银行＆＃34）

Answer 1

如果＆＃34; x＆＃34;是你的dataframe

library(stringr)

x$industry <- as.character(x$industry)
x$industry <- str_trim(x$industry)
x$industry <- as.factor(x$industry)

然后您可以返回fct_collapse()来简化您的因素。

改变因子水平 - ＆＃34; f＆＃34; - 无法改变等级

1 个答案: