Question

我正在使用看起来像这样的数据集，除了有更多列包含“serial”和“loc”之类的数据：

start <-c(1,8,16,24,28,32)
end   <-c(4,9,20,27,30,45)
serial<-c(1,2,3,4,5,6)
loc<-c(8,63,90,32,89,75)
dataset<-data.frame(cbind(start,end, serial,loc))

这里每行实际上代表一连串的整数;我想将每个连续的整数放入自己的行中，并保留该行的其他属性。 “start”表示运行的开始，“end”表示运行的结束。因此，例如，在“数据集”的第一行中，我希望将该行分为四行：一行为1，一行为2，一行为3，一行为4.同样，第二行在“数据集“将分为两行：一行为8，一行为9等。

因此，只运行“数据集”前两行的输出如下所示：

split serial loc
    1 1 8
    2 1 8
    3 1 8
    4 1 8
    8 2 63
    9 2 63

Answer 1

假设serial是唯一行标识符的data.table解决方案

library(data.table)
DA <- as.data.table(dataset)
DB <- DA[,list(index = seq(start,end, by = 1), loc),by = serial]

如果serial不是唯一的行标识符，那么

DB <- DA[, list(index = seq(start,end, by = 1), loc, serial), by = list(rowid = seq_len(nrow(DA)))]

Answer 2

这是一种坚持基础R的方法。

temp <- mapply(seq, dataset$start, dataset$end)
dataset2 <- data.frame(serial = rep(dataset$serial, sapply(temp, length)),
                       index = unlist(temp),
                       loc = rep(dataset$loc, sapply(temp, length)))
list(head(dataset2), tail(dataset2))
# [[1]]
#   serial index loc
# 1      1     1   8
# 2      1     2   8
# 3      1     3   8
# 4      1     4   8
# 5      2     8  63
# 6      2     9  63
# 
# [[2]]
#    serial index loc
# 27      6    40  75
# 28      6    41  75
# 29      6    42  75
# 30      6    43  75
# 31      6    44  75
# 32      6    45  75

Answer 3

# create the ranges
ranges <- mapply(seq, dataset$start, dataset$end)

# create the tables
tables <- lapply(seq(ranges), function(i) 
             cbind(split=ranges[[i]], dataset[i, c("serial", "loc")]) ) 

# to put all the tables in one matrix: 
do.call(rbind, tables)

如何将数据组合成单行？

3 个答案: