Question

我有一个数据框（min_set_obs），它包含两列：第一列包含数值，称为treatment，第二列是一个名为seq的id列：

min_set_obs
 Treatment seq
       1   29
       1   23
       3   60
       1   6
       2   41
       1   5
       2   44

我们说我有一个数值向量，称为key：

key
[1] 1 1 1 2 2 3

即。一个三个1s，两个2s和一个3的矢量。

我如何确定min_set_obs数据框中的哪些行包含key向量中第一次出现的值？

我希望我的输出看起来像这样：

Treatment seq
   1   29
   1   23
   3   60
   1   6
   2   41
   2   44

即。来自min_set_obs的第六行是＆＃39;额外的＆＃39; （这是第四个1，当时应该只有三个1），所以它将被移除。

我熟悉%in%运算符，但我不认为它可以告诉我第一次出现key向量在第一列中的位置min_set_obs数据框。

由于

Answer 1

以下是base R的选项，我们split将'min_set_obs''处理'变为list，获取head元素list 1}}使用相应频率的'key'和rbind list元素到单个data.frame

res <- do.call(rbind, Map(head, split(min_set_obs, min_set_obs$Treatment), n = table(key)))
row.names(res) <- NULL
res
#   Treatment seq
#1         1  29
#2         1  23   
#3         1   6
#4         2  41
#5         2  44
#6         3  60

Answer 2

使用dplyr，您可以先使用keys计算table，然后从每个组中相应地排在前n行：

library(dplyr)
m <- table(key)

min_set_obs %>% group_by(Treatment) %>% do({
    # as.character(.$Treatment[1]) returns the treatment for the current group
    # use coalesce to get the default number of rows (0) if the treatment doesn't exist in key
    head(., coalesce(m[as.character(.$Treatment[1])], 0L))
})

# A tibble: 6 x 2
# Groups:   Treatment [3]
#  Treatment   seq
#      <int> <int>
#1         1    29
#2         1    23
#3         1     6
#4         2    41
#5         2    44
#6         3    60

如何在数据框列中找到第一次出现的数字元素向量？

2 个答案: