Question

我有这样的数据框，

a tr78.3
a tr78.2
a tr79.1
b tr12.2
b tr12.3

我想在第二列中删除重复项，忽略小数，这是我的输出（我可以选择第一列），

a tr78.3
a tr79.1
b tr12.2

如何获取唯一值有很多方法（例如df％＆gt;％distinct（df $ V1，df $ V2），但我该如何指定我的问题？

Answer 1

我们可以使用sub提取子字符串，应用duplicated来获取逻辑索引并对数据集进行子集

df1[!duplicated(sub("\\.\\d+$", "", df1[,2])),]

或使用tidyverse

library(dplyr)
library(stringr)
df1 %>%
    distinct(V2 = str_replace(V2, "\\.\\d+$", ""), .keep_all = TRUE)

数据

df1 <- structure(list(V1 = c("a", "a", "a", "b", "b"), V2 = c("tr78.3", 
 "tr78.2", "tr79.1", "tr12.2", "tr12.3")), .Names = c("V1", "V2"
), class = "data.frame", row.names = c(NA, -5L))

删除重复项，忽略小数

1 个答案:

数据