Question

我有一系列重复的ID，我想分配给具有修复大小的组。主题ID以不同的频率重复，例如：

# Example Data 
ID = c(101,102,103,104)
Repeats = c(2,3,1,3)
Data = data.frame(ID,Repeats)
> head(Data)
   ID Repeats
1 101       2
2 102       3
3 103       1
4 104       3

我希望相同的重复ID保持在同一组内。但是，每个组都有固定的容量（仅限3个）。例如，在我想要的输出矩阵中，每个组只能容纳3个ID：

# Create empty data frame for group annotation
# Add 3 rows in order to have more space for IDs
# Some groups will have NAs due to  keeping IDs together (I'm OK with that)
Target = data.frame(matrix(NA,nrow=(sum(Data$Repeats)+3),
                                   ncol=dim(Data)[2]))
names(Target)<-c("ID","Group")
Target$Group<-rep(1:3)
Target$Group<-sort(Target$Group)
> head(Target)
  ID Group
1 NA     1
2 NA     1
3 NA     1
4 NA     1
5 NA     2
6 NA     2

我可以将每个ID循环到我的目标数据框，但这并不能保证重复的ID会保留在同一个组中：

# Loop repeated IDs the groups 
IDs.repeat = rep(Data$ID, times=Data$Repeats)
# loop IDs to Targets to assign IDs to groups
for (i in 1:length(IDs.repeat))
{
  Target$ID[i]<-IDs.repeat[i]
}

在上面循环的示例中，我在两个不同的组（1和2）中获得相同的ID（102），我想避免这种情况！：

> head(Target)
   ID Group
1 101     1
2 101     1
3 102     1
4 102     1
5 102     2
6 103     2

相反，如果该组中没有该ID的空间，我希望输出看起来放置NA的代码。

> head(Target)
   ID Group
1 101     1
2 101     1
3  NA     1
4  NA     1
5 102     2
6 102     2

如果在分配ID i后有足够的空间，任何人都有ID可以保持在同一组内的解决方案吗？

我认为我需要一个循环并计算该组中的NAs，并查看NAs＆gt; =是否为该唯一ID的长度。但是，我不知道如何同时实现这一点。也许为j组嵌套另一个循环？

对循环的任何帮助都将受到极大的欢迎！

Answer 1

这是一个解决方案，

## This is the data.frame I'll try to match
target <- data.frame(
  ID = c(
    rep(101, 2),
    rep(102, 3),
    rep(103, 1),
    rep(104, 3)),
  Group = c(
    rep(1L, 6), # "L", like 1L makes it an int type rather than numeric
    rep(2L, 3)
  )
)
print(target)

## Your example data
ID = c(101,102,103,104)
Repeats = c(2,3,1,3)
Data = data.frame(ID,Repeats)
head(Data)


ids_to_group <- 3 # the number of ids per group is specified here.
Data$Group <- sort(
  rep(1:ceiling(length(Data$ID) / ids_to_group),
      ids_to_group))[1:length(Data$ID)]

# The do.call(rbind, lapply(x = a series, FUN = function(x) { }))
# pattern is a really useful way to stack data.frames
# lapply is basically a fancy for-loop check it out by sending
# ?lapply to the console (to view the help page).
output <- do.call(
  rbind,
  lapply(unique(Data$ID), FUN = function(ids) {
    print(paste(ids, "done.")) # I like to put print statements to follow along
    obs <- Data[Data$ID == ids, ]
    data.frame(ID = rep(obs$ID, obs$Repeats))
  })
)

output <- merge(output, Data[,c("ID", "Group")], by = "ID")

identical(target, output) # returns true if they're equivalent

# For example inspect each with:
str(target)
str(output)

R循环有条件

1 个答案: