在数据集中搜索名称模式

时间:2018-11-24 18:47:33

标签: r

我已将问题最小化为仅包含字符的数据集(df_sum)

"LPC(20:1) uM"         "LPE(16:0) uM"         "LPE(16:1) uM"         "LPE(18:0) uM"         "LPE(18:1) uM"         "PA(32:1) uM"          "PA(34:1) uM"         
"PA(36:1) uM"          "PS(34:1) uM"          "PS(36:1) uM"          "PG(34:1) uM"          "PG(36:1) uM"          "PE(28:0) uM"          "PE(30:1) uM"
"LPC(20:1)"         "LPE(16:0)"         "LPE(16:1)"         "LPE(18:0)"         "LPE(18:1)"         "PA(32:1)"          "PA(34:1)"         
"PS(36:1)"          "PG(34:1)" 

如您所见,有些值是相同的,但末尾带有一个额外的标记“ uM”。

我的目标是在不删除uM标签的情况下找到唯一且实际上相同的值(我尝试过的操作,例如df_sum <- sub(" uM", "", df_sum)

任何帮助将不胜感激

1 个答案:

答案 0 :(得分:0)

好的,我已经完成了。这是我使用的代码:

names.um <- names(df_sum[,names(dplyr::select(df_sum, dplyr::contains("uM")))]) #select 'uM' names from joint dataset 
names.um <- sub(" uM", "", names.um )#remove the 'uM' tag 

names.filou <- names(df_sum[,names(dplyr::select(df_sum, dplyr::ends_with(")")))])#select 'Filou' names from joint dataset 

pos.filou <- which(!names.filou %in% names.um)#(1)find possitions where values from 'Filou' don't match the ones from 'uM'
pos.um <- which(!names.um %in% names.filou)#(2)find possitions where values from 'uM' don't match the ones from 'Filou'


names.filou[pos.filou]#show values from (1)
names.um[pos.um]#show values from (2)