通过计数有条件地替换字符值

时间:2018-03-13 09:09:35

标签: r

我有这样的数据:

df %>% str()
'data.frame':   50 obs. of  1 variable:
 $ Title: chr  " Mr" " Mrs" " Mr" " Mr" ...

df <- structure(list(Title = c(" Mr", " Mrs", " Mr", " Mr", " Mrs", 
                               " Mr", " Miss", " Mr", " Mrs", " Mr", " Mr", " Mr", " Mrs", " Mr", 
                               " Mrs", " Mrs", " Mr", " Mr", " Miss", " Mrs", " Mr", " Master", 
                               " Mrs", " Mr", " Mrs", " Mr", " Miss", " Mr", " Mr", " Mr", " Mr", 
                               " Mr", " Mrs", " Mrs", " Mr", " Mr", " Miss", " Miss", " Mr", 
                               " Mr", " Mr", " Mr", " Mr", " Mrs", " Mrs", " Mr", " Mr", " Mr", 
                               " Mrs", " Mrs")), .Names = "Title", row.names = c(NA, 50L), class = "data.frame")

如果这些值少于6次,我的目标是用""替换这些值。

library(dplyr)
df %>% count(Title) %>%
arrange(-n)

Title   n
Mr      29          
Mrs     15          
Miss    5           
Master  1

在这种情况下,这将是MasterMiss

我试过这种方式,但它不起作用:

df$Title[df$Title %in% c("Miss", "Master")] <- ""

我会感谢任何帮助。

3 个答案:

答案 0 :(得分:1)

试试这个:

library(dplyr)
df %>% 
  count(Title) %>%
  arrange(-n) %>% 
  mutate(Title=ifelse(n<=6, "", Title))

输出:

# A tibble: 4 x 2
#   Title      n
#   <chr>  <int>
# 1 " Mr"     29
# 2 " Mrs"    15
# 3 ""         5
# 4 ""         1

或关于您想要的输出,您可以添加:

df %>% 
  count(Title) %>%
  arrange(-n) %>% 
  mutate(Title=ifelse(n<=6, "", Title),
         n=ifelse(n<=6, "", n))

输出:

# A tibble: 4 x 2
#  Title  n     
#   <chr>  <chr> 
# 1 " Mr"  29 
# 2 " Mrs" 15
# 3 ""     ""    
# 4 ""     ""    

答案 1 :(得分:1)

您可以考虑使用trimws()函数删除空格

mylist <- structure(list(Title = c(" Mr", " Mrs", " Mr", " Mr", " Mrs", 
                     " Mr", " Miss", " Mr", " Mrs", " Mr", " Mr", " Mr", " 
                      Mrs", " Mr", 
                     " Mrs", " Mrs", " Mr", " Mr", " Miss", " Mrs", " Mr", " Master", 
                     " Mrs", " Mr", " Mrs", " Mr", " Miss", " Mr", " Mr", " Mr", " Mr", 
                     " Mr", " Mrs", " Mrs", " Mr", " Mr", " Miss", " Miss", " Mr", 
                     " Mr", " Mr", " Mr", " Mr", " Mrs", " Mrs", " Mr", " Mr", " Mr", 
                     " Mrs", " Mrs")), .Names = "Title", row.names = c(NA, 50L), class = "data.frame")

> df <- as.data.frame(mylist)
> table(nchar(df$Title))

 3  4  5  7 
29 15  5  1 
> df$Title <- trimws(df$Title)
> table(nchar(df$Title))

 2  3  4  6 
29 15  5  1 
> c(df$Title)
 [1] "Mr"     "Mrs"    "Mr"     "Mr"     "Mrs"    "Mr"     "Miss"   "Mr"     
"Mrs"    "Mr"     "Mr"    
[12] "Mr"     "Mrs"    "Mr"     "Mrs"    "Mrs"    "Mr"     "Mr"     "Miss"   
"Mrs"    "Mr"     "Master"
[23] "Mrs"    "Mr"     "Mrs"    "Mr"     "Miss"   "Mr"     "Mr"     "Mr"     
"Mr"     "Mr"     "Mrs"   
[34] "Mrs"    "Mr"     "Mr"     "Miss"   "Miss"   "Mr"     "Mr"     "Mr"     
"Mr"     "Mr"     "Mrs"   
[45] "Mrs"    "Mr"     "Mr"     "Mr"     "Mrs"    "Mrs"   

答案 2 :(得分:0)

您希望以正确的方式替换MasterMiss。不过,您的data.frame没有"Master""Miss",而是" Master"" Miss"。您的所有参赛作品都有前导空格。