Question

我遇到的数据集有问题。数据如下：

>data
     V1   V2   V3   V4  V5
1    A1   
2   630  554  323 1234 434
3   343  423  423  324 234
4    A2   
5   234 1243 4123   43 324
6    A3   
7  3123 3213   32 3123 422
8    A4   
9  3123  413   42 4214 412
10  124  423  543   35 353
11   A5   
12  423  423  234  234 234

我想得到这样的结果：

A1  630 554 323 1234 434
A1  343 423 423  324 234
A2  234 124 341   43 324
A3  312 13  32 3123  422
A4  312 413 42  4214 412
A4  124 423 543 35   353
A5  423 423 234 234  234

有没有办法做到这一点？

Answer 1

DF <- read.table(text="V1   V2   V3   V4  V5
1    A1   
2   630  554  323 1234 434
3   343  423  423  324 234
4    A2   
5   234 1243 4123   43 324
6    A3   
7  3123 3213   32 3123 422
8    A4   
9  3123  413   42 4214 412
10  124  423  543   35 353
11   A5   
12  423  423  234  234 234", header=TRUE, fill=TRUE, colClasses="character")

如果您的输入数据属于factor类，则需要先将其变为字符。

#separate ids and numbers
id <- DF[DF[,2]=="",1]
DF1 <- DF[DF[,2]!="",]

#calculate how often each id repeats    
indL <- rle(is.na(as.numeric(DF[,1])))
indL <- indL$lengths[!indL$value]
#create id vector
id <- rep(id, indL)
#remove "old" rownames if you wish
row.names(DF1) <- NULL
#put everything together in a data.frame
DF1 <- cbind.data.frame(id, sapply(DF1, as.integer))

#   id   V1   V2   V3   V4  V5
# 1 A1  630  554  323 1234 434
# 2 A1  343  423  423  324 234
# 3 A2  234 1243 4123   43 324
# 4 A3 3123 3213   32 3123 422
# 5 A4 3123  413   42 4214 412
# 6 A4  124  423  543   35 353
# 7 A5  423  423  234  234 234

Answer 2

以下是使用split和table的其他解决方案。

# separate into two data frames. One with numbers one with ids
s = split(df,df$V2=="")
data = s[[1]]
ids = s[[2]][,1]

# repeat the ids according to how many rows are repeated
data$id = rep( ids, table( cumsum( df[,2] == "" ) ) - 1 )
data[,c(6,1:5)]

合并行以使行具有相同的行名称

2 个答案: