假设我有n个不同的无序元素对。我想从n对中提取包含k个不同元素的最小数量的对。
我知道我可以使用duplicated()
从n对中提取所有不同的元素,但我不知道如何使用它来获得包含k个元素的最小数量的对。
这是一个例子。
假设我在data.frame中有8对:
x_coord <- c("x1","x1","x1","x2","x2","x3","x4","x4")
y_coord <- c("y1","y2","y3","y1","y4","y5","y2","y5")
df <- data.frame(x_coord, y_coord)
df
x_coord y_coord
1 x1 y1
2 x1 y2
3 x1 y3
4 x2 y1
5 x2 y4
6 x3 y5
7 x4 y2
8 x4 y5
如果我使用duplicated()
,我会获得:
x_coord_vector = as.vector(df$x_coord)
y_coord_vector = as.vector(df$y_coord)
df_vector <- c(x_coord_vector, y_coord_vector)
distinct_elements <- df_vector[!duplicated(df_vector)]
distinct_elements
# [1] "x1" "x2" "x3" "x4" "y1" "y2" "y3" "y4" "y5"
如果我想要包含6个不同元素的最小数量的对,则输出应为:
df_6_distinct_elements
x_coord y_coord
1 x1 y1
2 x1 y2
3 x1 y3
4 x2 y1
5 x2 y4
请注意,函数duplicated()
对于此类任务可能甚至无效。因此,任何建议都是受欢迎的。
答案 0 :(得分:0)
我认为这可以解决您的问题。这会找到最小行,以便在对中具有至少n
个唯一值。您可能会获得n + 1
个唯一元素。
x_coord <- c("x1","x1","x1","x2","x2","x3","x4","x4")
y_coord <- c("y1","y2","y3","y1","y4","y5","y2","y5")
df <- data.frame(x_coord, y_coord)
## Define the number of unique elements
n <- 6
## Get the ordered values and find the nth unique value
uvals <- c(t(df))
fval <- uvals[which(!duplicated(uvals))[n]]
## Select the first column with the nth unique value
df[1:which(fval == df$x_coord | fval == df$y_coord)[1], ]
# x_coord y_coord
# 1 x1 y1
# 2 x1 y2
# 3 x1 y3
# 4 x2 y1
# 5 x2 y4