我正在尝试从几行字符串中“删除”特定字符。
我能够从该列中提取要“删除”的特定字符,但无法递归地将它们替换为“”。
我尝试了mapvalues
,gsub
和str_replace
的某些选项,但是我没有运气
#Example data
test_col<-data.frame(sequence=c("ATGCRYSW\n",
"ATGCRYSW\\n",
"ATGCRYSW\r\n",
"ATGCRYSW\r\nATGCRYSW",
"ATGCRYSW"),
stringsAsFactors = FALSE)
#vector of allowed characters in strings
permitted_seq_chars<-c("A","C","G","T","R","Y","S","W","K",
"M","B","D","H","V","N","+","-","X")
#get all the unique characters in column of interest
all_unique_source_seq_chars<-unique(unlist(strsplit(test_col[["sequence"]],
split ="")))
#subset invalid characters
all_unique_source_seq_invalid_chars<-setdiff(all_unique_source_seq_chars,
permitted_seq_chars )
#'delete' invalid characters one by one. So far the only way I've been able to
# do so, but i would like to not depend on fixed variables if new ones arise
# in the future
str_replace_all(test_col$sequence, c( "\n"= "",
"\\"="",
"n"=""))
有什么方法可以仅通过查看all_unique_source_seq_invalid_chars
来递归地做到这一点吗?
答案 0 :(得分:2)
一种选择是将paste
的各个字符作为由方括号括起来的模式字符串进行字面值评估(如果有元字符),然后在{中将其替换为空白(""
) {1}}
gsub