我有一个带有“ Symbol”列的数据框(x),我想将其替换(所有的“-*”替换为“”),但是我不想更改某些值,例如:1-Mar,1- 9月1日至12月...
x<-data.frame("ID"=c("a","b","c","d","e","f","g","h","i"),"Symbol"=c("3-Mar","STON1-GTF2A1L","1-Dec","NME1-NME2","12-Mar","TNFSF12-TNFSF13","8-Mar","TMEM189-UBE2V1","10-Sep"))
我尝试了以下代码:x$Symbol<-gsub ("-*", "", x$Symbol)
但是它发生了变化(3月1日,9月1日,12月1日)
我需要以下数据框
x<-data.frame("ID"=c("a","b","c","d","e","f","g","h","i"),"Symbol"=c("3-Mar","STON1","1-Dec","NME1","12-Mar","TNFSF12","8-Mar","TMEM189","10-Sep"))
答案 0 :(得分:0)
您可以使用
x$Symbol <- sub("-(?!(?:Jan|Feb|Mar|Apr|May|Ju[nl]|Aug|Sep|Oct|Nov|Dec)$).*", "", x$Symbol, perl=TRUE)
请参见regex demo
详细信息
-
-连字符(?!(?:Jan|Feb|Mar|Apr|May|Ju[nl]|Aug|Sep|Oct|Nov|Dec)$)
-如果在当前位置的右边紧接字符串末尾有一个缩写的月份名称,则匹配失败的否定前行(注意:如果您可能会在月份名称之后包含更多文本,请使用(?!(?:Jan|Feb|Mar|Apr|May|Ju[nl]|Aug|Sep|Oct|Nov|Dec)\\b)
(将月份名称作为整个单词进行匹配)或使用(?!Jan|Feb|Mar|Apr|May|Ju[nl]|Aug|Sep|Oct|Nov|Dec)
将名称作为无界子字符串进行匹配).*
-尽可能多的0+字符(换行符除外)。df<-data.frame("ID"=c("a","b","c","d","e","f","g","h","i"),"Symbol"=c("3-Mar","STON1-GTF2A1L","1-Dec","NME1-NME2","12-Mar","TNFSF12-TNFSF13","8-Mar","TMEM189-UBE2V1","10-Sep"))
df$Symbol <- sub("-(?!(?:Jan|Feb|Mar|Apr|May|Ju[nl]|Aug|Sep|Oct|Nov|Dec)$).*", "", df$Symbol, perl=TRUE)
df
输出:
ID Symbol
1 a 3-Mar
2 b STON1
3 c 1-Dec
4 d NME1
5 e 12-Mar
6 f TNFSF12
7 g 8-Mar
8 h TMEM189
9 i 10-Sep
答案 1 :(得分:0)
您可以在paste
中用Symbol
“ 18”来查看它是否解析为Date
和sub
而不是日期的值。
df$Symbol <- with(df, ifelse(is.na(as.Date(paste0(Symbol, "-18"), "%d-%b-%y")),
sub ("-.*", "", Symbol), Symbol))
df
# ID Symbol
#1 a 3-Mar
#2 b STON1
#3 c 1-Dec
#4 d NME1
#5 e 12-Mar
#6 f TNFSF12
#7 g 8-Mar
#8 h TMEM189
#9 i 10-Sep
首次运行
df$Symbol <- as.character(df$Symbol)
将Symbol
转换为字符。