使用R清除名称

时间:2018-04-09 10:24:52

标签: r regex

想要从NameFirst列中删除标题输出将像Clean_name列一样。 任何建议。

> df
           NAMEFIRST     Clean_name
1         BHASOTI MS        BHASOTI
2          BHABESHMR        BHABESH
3             RINAMS           RINA
4        SUSHMITAMRS       SUSHMITA
5         ARKADIY MR        ARKADIY
6  PRAMOD TRIMBAK DR PRAMOD TRIMBAK
7          ANDREW MR         ANDREW
8      MICHELLE MISS       MICHELLE
9         DINESHA MR        DINESHA
10        SREEDHARMR       SREEDHAR
11        PANKAJMSTR         PANKAJ
12   SUSHIL KUMAR MR   SUSHIL KUMAR
13          FAZLURMR         FAZLUR

3 个答案:

答案 0 :(得分:0)

您未提供任何可用数据。可以这样解决:

column <- c("MICHELLE MISS","PRAMOD TRIMBAK DR")
sub("(\\s*(MR|DR|MISS|MS|MSTR|RS))$","",column)

输出:

 "MICHELLE"       "PRAMOD TRIMBAK"

答案 1 :(得分:0)

df <- data.frame(name = c("RAMOREYDR","SAMUEL MR","MR KOOL","HANDSOMEDR","GELLER DR","SONIA MS"))
df
#         name
# 1  RAMOREYDR
# 2  SAMUEL MR
# 3    MR KOOL
# 4 HANDSOMEDR
# 5  GELLER DR
# 6   SONIA MS

df$Clean_Name <- gsub(" MR|MR|MR | MS|MS|MS | DR|DR|DR ", "", df$name)
df
#         name Clean_Name
# 1  RAMOREYDR    RAMOREY
# 2  SAMUEL MR     SAMUEL
# 3    MR KOOL       KOOL
# 4 HANDSOMEDR   HANDSOME
# 5  GELLER DR     GELLER
# 6   SONIA MS      SONIA

答案 2 :(得分:-1)

这个正则表达式可以解决这个问题:

df
                name     Clean_name
1         BHASOTI MS        BHASOTI
2          BHABESHMR        BHABESH
3             RINAMS           RINA
4        SUSHMITAMRS       SUSHMITA
5         ARKADIY MR        ARKADIY
6  PRAMOD TRIMBAK DR PRAMOD TRIMBAK
7          ANDREW MR         ANDREW
8      MICHELLE MISS       MICHELLE
9         DINESHA MR        DINESHA
10        SREEDHARMR       SREEDHAR
11        PANKAJMSTR         PANKAJ
12   SUSHIL KUMAR MR   SUSHIL KUMAR
13          FAZLURMR         FAZLUR

df$name_cleaned <- gsub(" *(MS)|(MR)|(DR)|(MRS)|(MISS)|(MSTR)$", "", df$name)
df
                name     Clean_name    name_cleaned
1         BHASOTI MS        BHASOTI         BHASOTI
2          BHABESHMR        BHABESH         BHABESH
3             RINAMS           RINA            RINA
4        SUSHMITAMRS       SUSHMITA        SUSHMITA
5         ARKADIY MR        ARKADIY        ARKADIY 
6  PRAMOD TRIMBAK DR PRAMOD TRIMBAK PRAMOD TRIMBAK 
7          ANDREW MR         ANDREW           ANEW 
8      MICHELLE MISS       MICHELLE       MICHELLE 
9         DINESHA MR        DINESHA        DINESHA 
10        SREEDHARMR       SREEDHAR        SREEDHAR
11        PANKAJMSTR         PANKAJ          PANKAJ
12   SUSHIL KUMAR MR   SUSHIL KUMAR   SUSHIL KUMAR 
13          FAZLURMR         FAZLUR          FAZLUR

您可以通过与|

分隔,在正则表达式中添加要删除的元素