如何用列中的另一个模式替换除特定模式之外的其他模式?

时间:2019-02-05 09:38:02

标签: r regex gsub

我有一个带有“ Symbol”列的数据框(x),我想将其替换(所有的“-*”替换为“”),但是我不想更改某些值,例如:1-Mar,1- 9月1日至12月...

x<-data.frame("ID"=c("a","b","c","d","e","f","g","h","i"),"Symbol"=c("3-Mar","STON1-GTF2A1L","1-Dec","NME1-NME2","12-Mar","TNFSF12-TNFSF13","8-Mar","TMEM189-UBE2V1","10-Sep"))

我尝试了以下代码:x$Symbol<-gsub ("-*", "", x$Symbol) 但是它发生了变化(3月1日,9月1日,12月1日)

我需要以下数据框

x<-data.frame("ID"=c("a","b","c","d","e","f","g","h","i"),"Symbol"=c("3-Mar","STON1","1-Dec","NME1","12-Mar","TNFSF12","8-Mar","TMEM189","10-Sep"))

2 个答案:

答案 0 :(得分:0)

您可以使用

x$Symbol <- sub("-(?!(?:Jan|Feb|Mar|Apr|May|Ju[nl]|Aug|Sep|Oct|Nov|Dec)$).*", "", x$Symbol, perl=TRUE)

请参见regex demo

详细信息

  • --连字符
  • (?!(?:Jan|Feb|Mar|Apr|May|Ju[nl]|Aug|Sep|Oct|Nov|Dec)$)-如果在当前位置的右边紧接字符串末尾有一个缩写的月份名称,则匹配失败的否定前行(注意:如果您可能会在月份名称之后包含更多文本,请使用(?!(?:Jan|Feb|Mar|Apr|May|Ju[nl]|Aug|Sep|Oct|Nov|Dec)\\b)(将月份名称作为整个单词进行匹配)或使用(?!Jan|Feb|Mar|Apr|May|Ju[nl]|Aug|Sep|Oct|Nov|Dec)将名称作为无界子字符串进行匹配)
  • .*-尽可能多的0+字符(换行符除外)。

R demo

df<-data.frame("ID"=c("a","b","c","d","e","f","g","h","i"),"Symbol"=c("3-Mar","STON1-GTF2A1L","1-Dec","NME1-NME2","12-Mar","TNFSF12-TNFSF13","8-Mar","TMEM189-UBE2V1","10-Sep"))
df$Symbol <- sub("-(?!(?:Jan|Feb|Mar|Apr|May|Ju[nl]|Aug|Sep|Oct|Nov|Dec)$).*", "", df$Symbol, perl=TRUE)
df

输出:

  ID  Symbol
1  a   3-Mar
2  b   STON1
3  c   1-Dec
4  d    NME1
5  e  12-Mar
6  f TNFSF12
7  g   8-Mar
8  h TMEM189
9  i  10-Sep

答案 1 :(得分:0)

您可以在paste中用Symbol“ 18”来查看它是否解析为Datesub而不是日期的值。

df$Symbol <- with(df, ifelse(is.na(as.Date(paste0(Symbol, "-18"), "%d-%b-%y")), 
                   sub ("-.*", "", Symbol), Symbol))

df
#  ID  Symbol
#1  a   3-Mar
#2  b   STON1
#3  c   1-Dec
#4  d    NME1
#5  e  12-Mar
#6  f TNFSF12
#7  g   8-Mar
#8  h TMEM189
#9  i  10-Sep

首次运行

df$Symbol <- as.character(df$Symbol)

Symbol转换为字符。