我有一个数据框,其中包含一个名为" Control_Category"的变量。该变量中有六个名称,为简单起见,我将使用泛型:
df <- data.frame(Control_Category = c("Really Long Name One",
"Super Really Long Name Two",
"Another Really Flippin' Long Name Three",
",Seriously, It's a Fourth Long Name",
"Definitely a Fifth Long Name",
"Finally, This guy is done, number six"))
我用它来开个玩笑。因此,虽然名称很长,但它们很整洁,因为每个(1-6)的值是一致的。在data.frame的这个特定字符向量中,有数百个条目匹配这六个条目中的任何一个。
我需要做的是用短名称替换长名称。因此,在识别出任何上述名称的情况下,请使用较短的版本替换该名称,例如:
一 二 三 四 五 六
我尝试使用&#39; case_when&#39;它悲惨地失败了。任何帮助将不胜感激。
基于社区问题的其他信息
项目的顺序并不重要。没有指定1 - 6.恰好有6个,我做了6个愚蠢的长串。字符串本身很长。
所以,任何地方&#34; Super Really Long Name Two&#34;存在,该值需要更新为类似&#39; TWO&#34;或者&#34; Short_Name&#34;那是近似的&#34; TWO&#34;。实际上,该类别被称为&#34;审计,测试和考试结果&#34;。短名称理想情况下只是&#34; AUDIT&#34;。
答案 0 :(得分:3)
您可以为每次替换使用gsub()
一次:
df$Control_Category <- gsub('Really Long Name One', 'One', df$Control_Category)
您可以重复类似的逻辑来处理其他五个长/短名称对。
答案 1 :(得分:2)
这是一个名字较大的数据框:
set.seed(101)
long_names <- c("Really Long Name One",
"Super Really Long Name Two",
"Another Really Flippin' Long Name Three",
",Seriously, It's a Fourth Long Name",
"Definitely a Fifth Long Name",
"Finally, This guy is done, number six")
df <- data.frame(control_category=sample(long_names, 100, replace=TRUE))
head(df)
## control_category
## 1 Another Really Flippin' Long Name Three
## 2 Really Long Name One
## 3 Definitely a Fifth Long Name
## 4 ,Seriously, It's a Fourth Long Name
## 5 Super Really Long Name Two
## 6 Super Really Long Name Two
使用unique
功能将为您提供类别名称:
category <- unique(df$control_category)
print(category)
## [1] Another Really Flippin' Long Name Three
## [2] Really Long Name One
## [3] Definitely a Fifth Long Name
## [4] ,Seriously, It's a Fourth Long Name
## [5] Super Really Long Name Two
## [6] Finally, This guy is done, number six
## 6 Levels: ,Seriously, It's a Fourth Long Name ...
请注意,级别按字母顺序排列(请参阅levels(category)
)。在这种情况下,最简单的方法是通过查看当前订单手动更改订单。在这种情况下,category[c(2, 5, 1, 4, 3, 6)]
会为您提供正确的订单。最后,
df$control_category <- factor(
df$control_category,
levels=category[c(2, 5, 1, 4, 3, 6)],
labels=c("one", "two", "three", "four", "five", "six")
)
head(df)
## control_category
## 1 three
## 2 one
## 3 five
## 4 four
## 5 two
## 6 two