我正在尝试使用R扩展数据集。我记录了每个样本的观察值,并根据这些观察值计算了百分比。现在,我需要扩展每个样本以列出每个可能的观察结果而无需进行任何计算。 myData的示例: 起始数据集:
Sample Observation Percent
A Y 50
A N 50
B Y 10
B N 80
B Don't know 10
所需数据集:
Sample Observation Percent
A Y 50
A N 50
A Don't know NA
B Y 10
B N 80
B Don't know 10
因此,在这种情况下,我需要将所有样本A扩展为包括“未知”类别,并用“ NA”填充。
我尝试过
myTable <- table(myData)
TableFrame2 <- data.frame(myTable)
哪个扩展数据集,但弄乱了Percentage列(为什么)。我以为可以将百分比合并回去,但是我需要将该列与示例列和观察列的扩展集进行匹配,以实现完全匹配。有什么建议么?
答案 0 :(得分:1)
一种方法是将组合合并/合并回到数据中。 (我对数据进行了一些更改,以使其易于在SO中复制/粘贴。)
dat <- read.table(header=TRUE, stringsAsFactors=FALSE, text='
Sample Observation Percent
A Y 50
A N 50
B Y 10
B N 80
B Don_t_know 10 ')
基本R
merge(
dat,
expand.grid(Sample = unique(dat$Sample),
Observation = unique(dat$Observation),
stringsAsFactors = FALSE),
by = c("Sample", "Observation"),
all = TRUE
)
# Sample Observation Percent
# 1 A Don_t_know NA
# 2 A N 50
# 3 A Y 50
# 4 B Don_t_know 10
# 5 B N 80
# 6 B Y 10
Tidyverse:
library(dplyr)
library(tidyr)
dat %>%
full_join(
crossing(Sample = unique(dat$Sample), Observation = unique(dat$Observation)),
by = c("Sample", "Observation")
)
# Sample Observation Percent
# 1 A Y 50
# 2 A N 50
# 3 B Y 10
# 4 B N 80
# 5 B Don_t_know 10
# 6 A Don_t_know NA
甚至
dat %>%
full_join(expand(., Sample, Observation))
# Joining, by = c("Sample", "Observation")
# Sample Observation Percent
# 1 A Y 50
# 2 A N 50
# 3 B Y 10
# 4 B N 80
# 5 B Don_t_know 10
# 6 A Don_t_know NA