根据条件修改数据框中的字符串名称

时间:2017-08-29 06:01:35

标签: r replace dplyr

我有一个数据框,其中包含一个名为" Control_Category"的变量。该变量中有六个名称,为简单起见,我将使用泛型:

df <- data.frame(Control_Category = c("Really Long Name One",
"Super Really Long Name Two",
"Another Really Flippin' Long Name Three",
",Seriously, It's a Fourth Long Name",
"Definitely a Fifth Long Name",
"Finally, This guy is done, number six"))

我用它来开个玩笑。因此,虽然名称很长,但它们很整洁,因为每个(1-6)的值是一致的。在data.frame的这个特定字符向量中,有数百个条目匹配这六个条目中的任何一个。

我需要做的是用短名称替换长名称。因此,在识别出任何上述名称的情况下,请使用较短的版本替换该名称,例如:

一 二 三 四 五 六

我尝试使用&#39; case_when&#39;它悲惨地失败了。任何帮助将不胜感激。

基于社区问题的其他信息

项目的顺序并不重要。没有指定1 - 6.恰好有6个,我做了6个愚蠢的长串。字符串本身很长。

所以,任何地方&#34; Super Really Long Name Two&#34;存在,该值需要更新为类似&#39; TWO&#34;或者&#34; Short_Name&#34;那是近似的&#34; TWO&#34;。实际上,该类别被称为&#34;审计,测试和考试结果&#34;。短名称理想情况下只是&#34; AUDIT&#34;。

2 个答案:

答案 0 :(得分:3)

您可以为每次替换使用gsub()一次:

df$Control_Category <- gsub('Really Long Name One', 'One',  df$Control_Category)

您可以重复类似的逻辑来处理其他五个长/短名称对。

答案 1 :(得分:2)

这是一个名字较大的数据框:

set.seed(101)
long_names <- c("Really Long Name One",
                "Super Really Long Name Two",
                "Another Really Flippin' Long Name Three",
                ",Seriously, It's a Fourth Long Name",
                "Definitely a Fifth Long Name",
                "Finally, This guy is done, number six")

df <- data.frame(control_category=sample(long_names, 100, replace=TRUE))
head(df)

##                          control_category
## 1 Another Really Flippin' Long Name Three
## 2                    Really Long Name One
## 3            Definitely a Fifth Long Name
## 4     ,Seriously, It's a Fourth Long Name
## 5              Super Really Long Name Two
## 6              Super Really Long Name Two

使用unique功能将为您提供类别名称:

category <- unique(df$control_category)
print(category)

## [1] Another Really Flippin' Long Name Three
## [2] Really Long Name One                   
## [3] Definitely a Fifth Long Name           
## [4] ,Seriously, It's a Fourth Long Name    
## [5] Super Really Long Name Two             
## [6] Finally, This guy is done, number six  
## 6 Levels: ,Seriously, It's a Fourth Long Name ...

请注意,级别按字母顺序排列(请参阅levels(category))。在这种情况下,最简单的方法是通过查看当前订单手动更改订单。在这种情况下,category[c(2, 5, 1, 4, 3, 6)]会为您提供正确的订单。最后,

df$control_category <- factor(
    df$control_category,
    levels=category[c(2, 5, 1, 4, 3, 6)],
    labels=c("one", "two", "three", "four", "five", "six")
)
head(df)

##   control_category
## 1            three
## 2              one
## 3             five
## 4             four
## 5              two
## 6              two