多列的子集数据

时间:2016-05-26 15:29:35

标签: r subset

考虑这里的数据:

X <- 1:4
Ya <- 10:13
Yb <- 2:5
Yc <- c(10,11,6,NA)

df <- data.frame(X, Ya, Yb, Yc)

对于每个X值,我想提取唯一的Y值(来自Ya:Yc

所以我试图获得一个输出:

# the first number is the X value, then the proceeding numbers are
# the unique Ya:Yc values for each row
# 1, 10, 2
# 2, 11, 3
# 3, 12, 4 , 6
# 4, 13, 5

我尝试过使用简单的for循环。

output1 <- c(NA,NA,NA,NA)

for(i in 1:4) {
  output1[i] <- c(i,as.numeric(unique(df[i,2:4 ])))
}

2 个答案:

答案 0 :(得分:3)

尝试:

library(dplyr)
library(tidyr)

df %>%
  gather(key, value, -X) %>%
  group_by(X) %>%
  distinct(value) %>%
  spread(key, value)

给出了:

#Source: local data frame [4 x 4]
#Groups: X [4]
#
#      X    Ya    Yb    Yc
#  (int) (dbl) (dbl) (dbl)
#1     1    10     2    NA
#2     2    11     3    NA
#3     3    12     4     6
#4     4    13     5    NA

答案 1 :(得分:2)

以下是使用data.table的选项。将data.frame转换为data.tablesetDT(df)),melt从“广”转换为“长”格式,获取unique元素by' X'和'value'列,然后dcast从'long'到'wide'格式。

library(data.table)
dcast(unique(melt(setDT(df), id.var="X"),
          by = c("X", "value")), X~variable, value.var="value")
#    X Ya Yb Yc
#1: 1 10  2 NA
#2: 2 11  3 NA
#3: 3 12  4  6
#4: 4 13  5 NA