考虑这里的数据:
X <- 1:4
Ya <- 10:13
Yb <- 2:5
Yc <- c(10,11,6,NA)
df <- data.frame(X, Ya, Yb, Yc)
对于每个X
值,我想提取唯一的Y
值(来自Ya:Yc
)
所以我试图获得一个输出:
# the first number is the X value, then the proceeding numbers are
# the unique Ya:Yc values for each row
# 1, 10, 2
# 2, 11, 3
# 3, 12, 4 , 6
# 4, 13, 5
我尝试过使用简单的for
循环。
output1 <- c(NA,NA,NA,NA)
for(i in 1:4) {
output1[i] <- c(i,as.numeric(unique(df[i,2:4 ])))
}
答案 0 :(得分:3)
尝试:
library(dplyr)
library(tidyr)
df %>%
gather(key, value, -X) %>%
group_by(X) %>%
distinct(value) %>%
spread(key, value)
给出了:
#Source: local data frame [4 x 4]
#Groups: X [4]
#
# X Ya Yb Yc
# (int) (dbl) (dbl) (dbl)
#1 1 10 2 NA
#2 2 11 3 NA
#3 3 12 4 6
#4 4 13 5 NA
答案 1 :(得分:2)
以下是使用data.table
的选项。将data.frame
转换为data.table
(setDT(df)
),melt
从“广”转换为“长”格式,获取unique
元素by
' X'和'value'列,然后dcast
从'long'到'wide'格式。
library(data.table)
dcast(unique(melt(setDT(df), id.var="X"),
by = c("X", "value")), X~variable, value.var="value")
# X Ya Yb Yc
#1: 1 10 2 NA
#2: 2 11 3 NA
#3: 3 12 4 6
#4: 4 13 5 NA