从字符向量中删除重复的字符串

时间:2015-10-16 14:17:28

标签: r

如何有效地删除此字符向量中的重复项?

unlist(Map(paste, colnames(df1), lapply(df1, levels), 
                          MoreArgs= list(sep=":")), use.names=FALSE)
#[1] "A:a" "A:b" "A:c" "B:c" "B:d" "B:e"

我尝试使用简单的函数: df1 <- data.frame( A= c('a', 'b', 'c', 'a', 'b'), B= c('d', 'e', 'c', 'd', 'e')) > dput(data[1:30]) c("AT2G27020 AT3G26340", "AT1G56450 AT3G26340", "AT1G13060 AT3G26340", "AT3G22630 AT3G26340", "AT3G22110 AT3G26340", "AT2G05840 AT3G26340", "AT1G47250 AT3G26340", "AT1G79210 AT3G26340", "AT2G27020 AT5G40580", "AT3G27430 AT5G40580", "AT4G31300 AT5G40580", "AT3G14290 AT5G40580", "AT3G22630 AT5G40580", "AT3G22110 AT5G40580", "AT5G35590 AT5G40580", "AT2G05840 AT5G40580", "AT3G60820 AT5G40580", "AT1G79210 AT5G40580", "AT2G27020 AT3G27430", "AT2G27020 AT4G31300", "AT1G53850 AT2G27020", "AT2G27020 AT5G66140", "AT2G27020 AT3G51260", "AT1G21720 AT2G27020", "AT1G56450 AT2G27020", "AT1G13060 AT2G27020", "AT2G27020 AT3G22630", "AT2G27020 AT4G14800", "AT2G27020 AT3G22110", "AT2G27020 AT5G35590" ) 但遗憾的是它不起作用。

那是我的坏事。通过重复,我的意思是相同的AGI,因此它们中的一些在“”中存储在一起并不重要。我想在我的向量中只有一次“ATXG ......”。我一开始并不知道矢量包含它们对......抱歉。

1 个答案:

答案 0 :(得分:5)

unique(unlist(strsplit(x, " ")))
 #[1] "AT2G27020" "AT3G26340" "AT1G56450" "AT1G13060" "AT3G22630" "AT3G22110"
 #[7] "AT2G05840" "AT1G47250" "AT1G79210" "AT5G40580" "AT3G27430" "AT4G31300"
#[13] "AT3G14290" "AT5G35590" "AT3G60820" "AT1G53850" "AT5G66140" "AT3G51260"
#[19] "AT1G21720" "AT4G14800"