我与几种物种的成对动物(雄性和雌性)一起工作,并在多个生命阶段进行了重复测量。 这是我正在使用的数据框的类型:
> ID <- rep(c(seq(from=1, to=4), seq(from=5, to=8)), times=2)
> partner <- rep(c(seq(from=4, to=1), seq(from=8, to=5)), times=2)
> stage<- c(rep("juvenile", 8), rep("adult", 8))
> sex<- rep((rep(c("male", "female"), each=2)), times=4)
> species<-rep(c("a", "b"), each=4, times=2)
> df<-data.frame(ID, partner, stage, sex, species)
ID partner stage sex species
1 1 4 juvenile male a
2 2 3 juvenile male a
3 3 2 juvenile female a
4 4 1 juvenile female a
5 5 8 juvenile male b
6 6 7 juvenile male b
7 7 6 juvenile female b
8 8 5 juvenile female b
9 1 4 adult male a
10 2 3 adult male a
11 3 2 adult female a
12 4 1 adult female a
13 5 8 adult male b
14 6 7 adult male b
15 7 6 adult female b
16 8 5 adult female b
我想为物种中的每个男性添加一个具有不同级别的因子变量(例如,ID = 1的个体始终具有因子水平A,ID = 2的个体始终具有因子水平B,依此类推),然后我想要他们的伙伴具有相同的因子水平(ID = 10的因子水平为A,ID = 9的因子水平为B,依此类推)。这是它的外观(此示例非常简单):
> df
ID partner stage sex species WANTED
1 1 4 juvenile male a A
2 2 3 juvenile male a B
3 3 2 juvenile female a A
4 4 1 juvenile female a B
5 5 8 juvenile male b A
6 6 7 juvenile male b B
7 7 6 juvenile female b A
8 8 5 juvenile female b B
9 1 4 adult male a A
10 2 3 adult male a B
11 3 2 adult female a A
12 4 1 adult female a B
13 5 8 adult male b A
14 6 7 adult male b B
15 7 6 adult female b A
16 8 5 adult female b B
重要注意事项:
谢谢!
编辑
这是我的真实数据帧,如dput
所示,但有很多困难:
抱歉,它的长度! (不确定如何将其隐藏在下拉菜单中)。
总结一下我想要的:
structure(list(ID = c(11489L, 11862L, 11539L, 11713L, 11271L,
9225L, 11588L, 9906L, 11039L, 9717L, 11539L, 11713L, 11489L,
11862L, 11403L, 11070L, 11271L, 9225L, 11039L, 9717L, 11588L,
9906L, 12124L, 12021L, 12029L, 12126L, 12020L, 12030L, 12125L,
10450L, 11371L, 11605L, 11327L, 11019L, 11741L, 11586L, 11740L,
11585L, 10575L, 11855L, 11500L, 11403L, 11070L, 11539L, 11713L,
11740L, 11585L, 11327L, 11019L, 11489L, 11862L, 12124L, 12021L,
11371L, 11605L, 12631L, 12304L, 12303L, 10008L, 12630L, 12275L,
12272L, 10007L, 12029L, 12126L, 12125L, 10450L, 11271L, 9225L,
11588L, 9906L, 11039L, 9717L, 12020L, 12030L, 12910L, 11588L,
9906L, 11039L, 9717L, 11539L, 11713L, 11271L, 9225L, 11403L,
11070L, 12094L, 12095L, 11255L, 12390L, 11257L, 11740L, 11585L,
11327L, 11019L, 11371L, 11605L, 12097L, 11611L, 12124L, 12021L,
12029L, 12126L, 12125L, 10450L, 12020L, 12030L, 12110L, 12910L,
12095L, 11740L, 11585L, 11255L, 12097L, 12390L, 11257L, 11611L,
12094L, 12631L, 12304L, 12303L, 10008L, 11209L, 12630L, 12275L,
11403L, 11070L, 12272L, 10007L, 12124L, 12021L, 11489L, 11862L,
10744L, 11209L, 10575L, 12110L, 10744L, 11069L, 11827L, 11066L,
12816L, 12415L, 12911L, 11248L, 12979L, 12746L, 12912L, 11855L,
11500L, 11741L, 11586L, 12125L, 10450L, 11248L, 12979L, 12746L,
12912L, 11066L, 12816L, 11643L, 11435L, 11069L, 11827L, 11327L,
11019L, 11371L, 11605L, 12631L, 12304L, 12272L, 10007L, 12630L,
12275L, 12910L, 12095L, 11209L, 10575L, 11643L, 11435L, 12110L,
10744L, 12771L, 12388L, 11611L, 12094L, 11255L, 12097L, 12390L,
11257L, 12272L, 10007L, 12303L, 10008L, 12631L, 12304L, 11855L,
11500L, 12910L, 12095L, 11255L, 12097L, 11741L, 11586L, 12771L,
12388L, 11069L, 11827L, 11066L, 12816L, 11611L, 12094L, 11855L,
11500L, 11643L, 11435L, 12303L, 10008L, 11741L, 11586L, 11209L,
10575L, 12746L, 12912L, 11248L, 12979L, 12630L, 12275L, 12110L,
10744L, 12029L, 12126L, 11066L, 12816L, 12415L, 12911L, 11069L,
11827L, 12771L, 12388L, 11643L, 11435L, 12746L, 12912L, 11248L,
12979L, 12415L, 12911L, 12390L, 11257L, 12415L, 12911L, 12020L,
12030L, 12771L, 12388L), Partner_ID = c(11862L, 11489L, 11713L,
11539L, 9225L, 11271L, 9906L, 11588L, 9717L, 11039L, 11713L,
11539L, 11862L, 11489L, 11070L, 11403L, 9225L, 11271L, 9717L,
11039L, 9906L, 11588L, 12021L, 12124L, 12126L, 12029L, 12030L,
12020L, 10450L, 12125L, 11605L, 11371L, 11019L, 11327L, 11586L,
11741L, 11585L, 11740L, 11209L, 11500L, 11855L, 11070L, 11403L,
11713L, 11539L, 11585L, 11740L, 11019L, 11327L, 11862L, 11489L,
12021L, 12124L, 11605L, 11371L, 12304L, 12631L, 10008L, 12303L,
12275L, 12630L, 10007L, 12272L, 12126L, 12029L, 10450L, 12125L,
9225L, 11271L, 9906L, 11588L, 9717L, 11039L, 12030L, 12020L,
12095L, 9906L, 11588L, 9717L, 11039L, 11713L, 11539L, 9225L,
11271L, 11070L, 11403L, 11611L, 12910L, 12097L, 11257L, 12390L,
11585L, 11740L, 11019L, 11327L, 11605L, 11371L, 11255L, 12094L,
12021L, 12124L, 12126L, 12029L, 10450L, 12125L, 12030L, 12020L,
10744L, 12095L, 12910L, 11585L, 11740L, 12097L, 11255L, 11257L,
12390L, 12094L, 11611L, 12304L, 12631L, 10008L, 12303L, 10575L,
12275L, 12630L, 11070L, 11403L, 10007L, 12272L, 12021L, 12124L,
11862L, 11489L, 12110L, 10575L, 11209L, 10744L, 12110L, 11827L,
11069L, 12816L, 11066L, 12911L, 12415L, 12979L, 11248L, 12912L,
12746L, 11500L, 11855L, 11586L, 11741L, 10450L, 12125L, 12979L,
11248L, 12912L, 12746L, 12816L, 11066L, 11435L, 11643L, 11827L,
11069L, 11019L, 11327L, 11605L, 11371L, 12304L, 12631L, 10007L,
12272L, 12275L, 12630L, 12095L, 12910L, 10575L, 11209L, 11435L,
11643L, 10744L, 12110L, 12388L, 12771L, 12094L, 11611L, 12097L,
11255L, 11257L, 12390L, 10007L, 12272L, 10008L, 12303L, 12304L,
12631L, 11500L, 11855L, 12095L, 12910L, 12097L, 11255L, 11586L,
11741L, 12388L, 12771L, 11827L, 11069L, 12816L, 11066L, 12094L,
11611L, 11500L, 11855L, 11435L, 11643L, 10008L, 12303L, 11586L,
11741L, 10575L, 11209L, 12912L, 12746L, 12979L, 11248L, 12275L,
12630L, 10744L, 12110L, 12126L, 12029L, 12816L, 11066L, 12911L,
12415L, 11827L, 11069L, 12388L, 12771L, 11435L, 11643L, 12912L,
12746L, 12979L, 11248L, 12911L, 12415L, 11257L, 12390L, 12911L,
12415L, 12030L, 12020L, 12388L, 12771L), Strain = structure(c(1L,
1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 4L, 1L, 1L, 1L, 1L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 4L, 1L, 1L, 4L, 4L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 3L, 3L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 2L, 4L, 4L, 4L, 4L, 1L,
1L, 4L, 4L, 4L, 4L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 2L, 2L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 3L, 3L, 4L, 4L, 3L, 3L,
3L, 3L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 2L, 2L, 2L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 3L, 3L, 2L, 2L, 2L, 2L, 4L, 4L, 1L,
1L, 4L, 4L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 4L,
4L, 1L, 1L, 4L, 4L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L,
3L, 3L, 3L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 3L, 3L, 4L, 4L, 4L,
4L, 2L, 2L, 1L, 1L, 1L, 1L, 3L, 3L, 1L, 1L, 4L, 4L, 2L, 2L, 2L,
2L, 3L, 3L, 4L, 4L, 3L, 3L, 4L, 4L, 2L, 2L, 4L, 4L, 3L, 3L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L), .Label = c("BW",
"IS", "LL", "PO"), class = "factor"), State = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 3L, 3L, 2L, 2L, 2L, 2L,
3L, 3L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 1L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L,
1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 2L, 2L, 4L, 4L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 4L, 4L, 2L, 2L,
4L, 4L, 4L, 4L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 4L, 4L, 2L, 2L, 2L, 2L, 2L, 2L, 1L,
1L, 2L, 2L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 2L, 2L, 3L, 3L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L,
4L, 4L, 4L, 3L, 3L, 4L, 4L, 4L, 4L, 3L, 3L, 2L, 2L, 3L, 3L, 3L,
3L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 3L, 3L, 3L,
3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 4L, 4L, 3L, 3L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 2L, 2L, 4L, 4L, 4L, 4L), .Label = c("Virgin",
"Mated", "Expecting", "Parent"), class = "factor"), Sex = structure(c(1L,
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L,
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L,
2L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L,
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L,
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 1L,
2L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L,
2L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 1L,
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 1L, 2L,
1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L,
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L,
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L,
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L,
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L,
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L,
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L,
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L), .Label = c("F",
"M"), class = "factor")), .Names = c("ID", "Partner_ID", "Strain",
"State", "Sex"), row.names = c(NA, -256L), class = "data.frame")
所以它看起来像这样:
ID Partner_ID Strain State Sex WANTED
1 11489 11862 BW Virgin F A1
2 11862 11489 BW Virgin M A1
3 11539 11713 BW Virgin F A2
4 11713 11539 BW Virgin M A2
5 11271 9225 PO Virgin F A1
6 9225 11271 PO Virgin M A1
7 11588 9906 PO Virgin F A2
8 9906 11588 PO Virgin M A2
9 11039 9717 PO Virgin F A3
10 9717 11039 PO Virgin M A3
11 11539 11713 BW Mated F A2
12 11713 11539 BW Mated M A2
13 11489 11862 BW Mated F A1
14 11862 11489 BW Mated M A1
15 11403 11070 PO Virgin F A4
16 11070 11403 PO Virgin M A4
17 11271 9225 PO Mated F A1
18 9225 11271 PO Mated M A1
19 11039 9717 PO Mated F A3
20 9717 11039 PO Mated M A3
答案 0 :(得分:1)
考虑两个ave
调用以内联运行分组聚合方法。首先,通过 Strain , State 和 Sex 生成原始组计数因子,然后为每个 Partner_ID 。然后,用as.factor
包装整列,以进行所需的类型转换。
with
下面是一种上下文管理器,用于引用列名而无需重复的数据框架引用df$
。
# RUNNING GROUP COUNT
df$RAW_WANTED <- as.factor(paste0("A", with(df, ave(ID, Strain, State, Sex,
FUN=seq_along))))
# RUNNING FIRST VALUE
df$WANTED <- as.factor(with(df, ave(as.character(RAW_WANTED), Partner_ID,
FUN=function(x) head(x, 1))))
head(df, 20)
# ID Partner_ID Strain State Sex RAW_WANTED WANTED
# 1 11489 11862 BW Virgin F A1 A1
# 2 11862 11489 BW Virgin M A1 A1
# 3 11539 11713 BW Virgin F A2 A2
# 4 11713 11539 BW Virgin M A2 A2
# 5 11271 9225 PO Virgin F A1 A1
# 6 9225 11271 PO Virgin M A1 A1
# 7 11588 9906 PO Virgin F A2 A2
# 8 9906 11588 PO Virgin M A2 A2
# 9 11039 9717 PO Virgin F A3 A3
# 10 9717 11039 PO Virgin M A3 A3
# 11 11539 11713 BW Mated F A1 A2
# 12 11713 11539 BW Mated M A1 A2
# 13 11489 11862 BW Mated F A2 A1
# 14 11862 11489 BW Mated M A2 A1
# 15 11403 11070 PO Virgin F A4 A4
# 16 11070 11403 PO Virgin M A4 A4
# 17 11271 9225 PO Mated F A1 A1
# 18 9225 11271 PO Mated M A1 A1
# 19 11039 9717 PO Mated F A2 A3
# 20 9717 11039 PO Mated M A2 A3
答案 1 :(得分:0)
这是在dplyr和自定义函数中完成我想做的事情的一种方式。这不是那么优雅,但至少更容易理解:
library(dplyr)
# This will give you 260 unique factors
facSet <- paste0(rep(LETTERS,each = 10),rep(1:10,times = 10))
getFactor <- function(subsetDF) {
key <- 1
subsetDF$Factor <- NA
for (i in 1:nrow(subsetDF)) {
if (subsetDF$sex[i] == "male") {
subsetDF$Factor[i] <- facSet[key]
key <- key + 1
}
}
for (i in 1:nrow(subsetDF)) {
if (subsetDF$sex[i] == "female") {
subsetDF$Factor[i] <- unique(subsetDF$Factor[which(subsetDF$partner[i] == subsetDF$ID)])[1]
}
}
return(subsetDF$Factor)
}
df <- df %>% group_by(species) %>% mutate(Factor = getFactor(data.frame(ID,sex,partner)))
输出:
> df
# A tibble: 16 x 6
# Groups: species [2]
ID partner stage sex species Factor
<int> <int> <fct> <fct> <fct> <chr>
1 1 4 juvenile male a A1
2 2 3 juvenile male a A2
3 3 2 juvenile female a A2
4 4 1 juvenile female a A1
5 5 8 juvenile male b A1
6 6 7 juvenile male b A2
7 7 6 juvenile female b A2
8 8 5 juvenile female b A1
9 1 4 adult male a A3
10 2 3 adult male a A4
11 3 2 adult female a A2
12 4 1 adult female a A1
13 5 8 adult male b A3
14 6 7 adult male b A4
15 7 6 adult female b A2
16 8 5 adult female b A1
注意:如果您需要超过260个唯一的男女对,请创建更大的 facSet !