将因子变量添加到个人及其配偶的重复测量中

时间:2019-01-11 22:43:18

标签: r dataframe match repeat r-factor

我与几种物种的成对动物(雄性和雌性)一起工作,并在多个生命阶段进行了重复测量。 这是我正在使用的数据框的类型:

> ID <- rep(c(seq(from=1, to=4), seq(from=5, to=8)), times=2)
> partner <- rep(c(seq(from=4, to=1), seq(from=8, to=5)), times=2)
> stage<- c(rep("juvenile", 8), rep("adult", 8))
> sex<- rep((rep(c("male", "female"), each=2)), times=4)
> species<-rep(c("a", "b"), each=4, times=2)
> df<-data.frame(ID, partner, stage, sex, species)

 ID partner    stage    sex species
1   1       4 juvenile   male       a
2   2       3 juvenile   male       a
3   3       2 juvenile female       a
4   4       1 juvenile female       a
5   5       8 juvenile   male       b
6   6       7 juvenile   male       b
7   7       6 juvenile female       b
8   8       5 juvenile female       b
9   1       4    adult   male       a
10  2       3    adult   male       a
11  3       2    adult female       a
12  4       1    adult female       a
13  5       8    adult   male       b
14  6       7    adult   male       b
15  7       6    adult female       b
16  8       5    adult female       b

我想为物种中的每个男性添加一个具有不同级别的因子变量(例如,ID = 1的个体始终具有因子水平A,ID = 2的个体始终具有因子水平B,依此类推),然后我想要他们的伙伴具有相同的因子水平(ID = 10的因子水平为A,ID = 9的因子水平为B,依此类推)。这是它的外观(此示例非常简单):

> df
   ID partner    stage    sex species WANTED
1   1       4 juvenile   male       a      A
2   2       3 juvenile   male       a      B
3   3       2 juvenile female       a      A
4   4       1 juvenile female       a      B
5   5       8 juvenile   male       b      A
6   6       7 juvenile   male       b      B
7   7       6 juvenile female       b      A
8   8       5 juvenile female       b      B
9   1       4    adult   male       a      A
10  2       3    adult   male       a      B
11  3       2    adult female       a      A
12  4       1    adult female       a      B
13  5       8    adult   male       b      A
14  6       7    adult   male       b      B
15  7       6    adult female       b      A
16  8       5    adult female       b      B

重要注意事项:

  • 在真实数据集中,物种之间的个体数量是不同的,因此,如果我将字母作为因子水平,则具有4个个体的x物种的因子水平将为A到D,而具有六个物种的y物种的个体因子将会是A到D。因子水平从A到F。
  • 我希望在处理其他物种时重新考虑因子水平(在我的示例数据框中,ID = 1具有因子水平A,ID = 11也是如此,因为它是不同的物种)。
  • 一个给定的个体在各个阶段(青少年和成人)应具有相同的因子水平

谢谢!

编辑 这是我的真实数据帧,如dput所示,但有很多困难: 抱歉,它的长度! (不确定如何将其隐藏在下拉菜单中)。 总结一下我想要的:

  • 一个新的阶乘变量,每个个体都有其因子水平,并且在生命的各个阶段都重复出现
  • 给定对的两个伙伴具有相同的因子水平
  • 因子水平在物种(品系)之间重复,例如BW具有A1,A2,... A8,而LL具有A1,A2,... A9
structure(list(ID = c(11489L, 11862L, 11539L, 11713L, 11271L, 
9225L, 11588L, 9906L, 11039L, 9717L, 11539L, 11713L, 11489L, 
11862L, 11403L, 11070L, 11271L, 9225L, 11039L, 9717L, 11588L, 
9906L, 12124L, 12021L, 12029L, 12126L, 12020L, 12030L, 12125L, 
10450L, 11371L, 11605L, 11327L, 11019L, 11741L, 11586L, 11740L, 
11585L, 10575L, 11855L, 11500L, 11403L, 11070L, 11539L, 11713L, 
11740L, 11585L, 11327L, 11019L, 11489L, 11862L, 12124L, 12021L, 
11371L, 11605L, 12631L, 12304L, 12303L, 10008L, 12630L, 12275L, 
12272L, 10007L, 12029L, 12126L, 12125L, 10450L, 11271L, 9225L, 
11588L, 9906L, 11039L, 9717L, 12020L, 12030L, 12910L, 11588L, 
9906L, 11039L, 9717L, 11539L, 11713L, 11271L, 9225L, 11403L, 
11070L, 12094L, 12095L, 11255L, 12390L, 11257L, 11740L, 11585L, 
11327L, 11019L, 11371L, 11605L, 12097L, 11611L, 12124L, 12021L, 
12029L, 12126L, 12125L, 10450L, 12020L, 12030L, 12110L, 12910L, 
12095L, 11740L, 11585L, 11255L, 12097L, 12390L, 11257L, 11611L, 
12094L, 12631L, 12304L, 12303L, 10008L, 11209L, 12630L, 12275L, 
11403L, 11070L, 12272L, 10007L, 12124L, 12021L, 11489L, 11862L, 
10744L, 11209L, 10575L, 12110L, 10744L, 11069L, 11827L, 11066L, 
12816L, 12415L, 12911L, 11248L, 12979L, 12746L, 12912L, 11855L, 
11500L, 11741L, 11586L, 12125L, 10450L, 11248L, 12979L, 12746L, 
12912L, 11066L, 12816L, 11643L, 11435L, 11069L, 11827L, 11327L, 
11019L, 11371L, 11605L, 12631L, 12304L, 12272L, 10007L, 12630L, 
12275L, 12910L, 12095L, 11209L, 10575L, 11643L, 11435L, 12110L, 
10744L, 12771L, 12388L, 11611L, 12094L, 11255L, 12097L, 12390L, 
11257L, 12272L, 10007L, 12303L, 10008L, 12631L, 12304L, 11855L, 
11500L, 12910L, 12095L, 11255L, 12097L, 11741L, 11586L, 12771L, 
12388L, 11069L, 11827L, 11066L, 12816L, 11611L, 12094L, 11855L, 
11500L, 11643L, 11435L, 12303L, 10008L, 11741L, 11586L, 11209L, 
10575L, 12746L, 12912L, 11248L, 12979L, 12630L, 12275L, 12110L, 
10744L, 12029L, 12126L, 11066L, 12816L, 12415L, 12911L, 11069L, 
11827L, 12771L, 12388L, 11643L, 11435L, 12746L, 12912L, 11248L, 
12979L, 12415L, 12911L, 12390L, 11257L, 12415L, 12911L, 12020L, 
12030L, 12771L, 12388L), Partner_ID = c(11862L, 11489L, 11713L, 
11539L, 9225L, 11271L, 9906L, 11588L, 9717L, 11039L, 11713L, 
11539L, 11862L, 11489L, 11070L, 11403L, 9225L, 11271L, 9717L, 
11039L, 9906L, 11588L, 12021L, 12124L, 12126L, 12029L, 12030L, 
12020L, 10450L, 12125L, 11605L, 11371L, 11019L, 11327L, 11586L, 
11741L, 11585L, 11740L, 11209L, 11500L, 11855L, 11070L, 11403L, 
11713L, 11539L, 11585L, 11740L, 11019L, 11327L, 11862L, 11489L, 
12021L, 12124L, 11605L, 11371L, 12304L, 12631L, 10008L, 12303L, 
12275L, 12630L, 10007L, 12272L, 12126L, 12029L, 10450L, 12125L, 
9225L, 11271L, 9906L, 11588L, 9717L, 11039L, 12030L, 12020L, 
12095L, 9906L, 11588L, 9717L, 11039L, 11713L, 11539L, 9225L, 
11271L, 11070L, 11403L, 11611L, 12910L, 12097L, 11257L, 12390L, 
11585L, 11740L, 11019L, 11327L, 11605L, 11371L, 11255L, 12094L, 
12021L, 12124L, 12126L, 12029L, 10450L, 12125L, 12030L, 12020L, 
10744L, 12095L, 12910L, 11585L, 11740L, 12097L, 11255L, 11257L, 
12390L, 12094L, 11611L, 12304L, 12631L, 10008L, 12303L, 10575L, 
12275L, 12630L, 11070L, 11403L, 10007L, 12272L, 12021L, 12124L, 
11862L, 11489L, 12110L, 10575L, 11209L, 10744L, 12110L, 11827L, 
11069L, 12816L, 11066L, 12911L, 12415L, 12979L, 11248L, 12912L, 
12746L, 11500L, 11855L, 11586L, 11741L, 10450L, 12125L, 12979L, 
11248L, 12912L, 12746L, 12816L, 11066L, 11435L, 11643L, 11827L, 
11069L, 11019L, 11327L, 11605L, 11371L, 12304L, 12631L, 10007L, 
12272L, 12275L, 12630L, 12095L, 12910L, 10575L, 11209L, 11435L, 
11643L, 10744L, 12110L, 12388L, 12771L, 12094L, 11611L, 12097L, 
11255L, 11257L, 12390L, 10007L, 12272L, 10008L, 12303L, 12304L, 
12631L, 11500L, 11855L, 12095L, 12910L, 12097L, 11255L, 11586L, 
11741L, 12388L, 12771L, 11827L, 11069L, 12816L, 11066L, 12094L, 
11611L, 11500L, 11855L, 11435L, 11643L, 10008L, 12303L, 11586L, 
11741L, 10575L, 11209L, 12912L, 12746L, 12979L, 11248L, 12275L, 
12630L, 10744L, 12110L, 12126L, 12029L, 12816L, 11066L, 12911L, 
12415L, 11827L, 11069L, 12388L, 12771L, 11435L, 11643L, 12912L, 
12746L, 12979L, 11248L, 12911L, 12415L, 11257L, 12390L, 12911L, 
12415L, 12030L, 12020L, 12388L, 12771L), Strain = structure(c(1L, 
1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 4L, 1L, 1L, 1L, 1L, 4L, 4L, 4L, 
4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 4L, 1L, 1L, 4L, 4L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 3L, 3L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 2L, 4L, 4L, 4L, 4L, 1L, 
1L, 4L, 4L, 4L, 4L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 
2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 2L, 2L, 1L, 1L, 2L, 
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 3L, 3L, 4L, 4L, 3L, 3L, 
3L, 3L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 2L, 2L, 2L, 
2L, 2L, 2L, 1L, 1L, 1L, 1L, 3L, 3L, 2L, 2L, 2L, 2L, 4L, 4L, 1L, 
1L, 4L, 4L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 4L, 
4L, 1L, 1L, 4L, 4L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 
3L, 3L, 3L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 3L, 3L, 4L, 4L, 4L, 
4L, 2L, 2L, 1L, 1L, 1L, 1L, 3L, 3L, 1L, 1L, 4L, 4L, 2L, 2L, 2L, 
2L, 3L, 3L, 4L, 4L, 3L, 3L, 4L, 4L, 2L, 2L, 4L, 4L, 3L, 3L, 1L, 
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L), .Label = c("BW", 
"IS", "LL", "PO"), class = "factor"), State = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 
2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 3L, 3L, 2L, 2L, 2L, 2L, 
3L, 3L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 
2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 1L, 4L, 4L, 4L, 4L, 4L, 
4L, 4L, 4L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 
1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 2L, 2L, 4L, 4L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 4L, 4L, 2L, 2L, 
4L, 4L, 4L, 4L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L, 4L, 4L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 
1L, 2L, 2L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 2L, 2L, 3L, 3L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 
4L, 4L, 4L, 3L, 3L, 4L, 4L, 4L, 4L, 3L, 3L, 2L, 2L, 3L, 3L, 3L, 
3L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 
3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 4L, 4L, 3L, 3L, 4L, 
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 2L, 2L, 4L, 4L, 4L, 4L), .Label = c("Virgin", 
"Mated", "Expecting", "Parent"), class = "factor"), Sex = structure(c(1L, 
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 
2L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 1L, 
2L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 
2L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 1L, 
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 
1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L), .Label = c("F", 
"M"), class = "factor")), .Names = c("ID", "Partner_ID", "Strain", 
"State", "Sex"), row.names = c(NA, -256L), class = "data.frame")

所以它看起来像这样:

       ID Partner_ID Strain  State Sex WANTED
1  11489      11862     BW Virgin   F     A1
2  11862      11489     BW Virgin   M     A1
3  11539      11713     BW Virgin   F     A2
4  11713      11539     BW Virgin   M     A2
5  11271       9225     PO Virgin   F     A1
6   9225      11271     PO Virgin   M     A1
7  11588       9906     PO Virgin   F     A2
8   9906      11588     PO Virgin   M     A2
9  11039       9717     PO Virgin   F     A3
10  9717      11039     PO Virgin   M     A3
11 11539      11713     BW  Mated   F     A2
12 11713      11539     BW  Mated   M     A2
13 11489      11862     BW  Mated   F     A1
14 11862      11489     BW  Mated   M     A1
15 11403      11070     PO Virgin   F     A4
16 11070      11403     PO Virgin   M     A4
17 11271       9225     PO  Mated   F     A1
18  9225      11271     PO  Mated   M     A1
19 11039       9717     PO  Mated   F     A3
20  9717      11039     PO  Mated   M     A3

2 个答案:

答案 0 :(得分:1)

考虑两个ave调用以内联运行分组聚合方法。首先,通过 Strain State Sex 生成原始组计数因子,然后为每个 Partner_ID 。然后,用as.factor包装整列,以进行所需的类型转换。

with下面是一种上下文管理器,用于引用列名而无需重复的数据框架引用df$

# RUNNING GROUP COUNT
df$RAW_WANTED <- as.factor(paste0("A", with(df, ave(ID, Strain, State, Sex, 
                                                    FUN=seq_along))))

# RUNNING FIRST VALUE
df$WANTED <- as.factor(with(df, ave(as.character(RAW_WANTED), Partner_ID,
                                    FUN=function(x) head(x, 1))))

head(df, 20)
#       ID Partner_ID Strain  State Sex RAW_WANTED WANTED
# 1  11489      11862     BW Virgin   F         A1     A1
# 2  11862      11489     BW Virgin   M         A1     A1
# 3  11539      11713     BW Virgin   F         A2     A2
# 4  11713      11539     BW Virgin   M         A2     A2
# 5  11271       9225     PO Virgin   F         A1     A1
# 6   9225      11271     PO Virgin   M         A1     A1
# 7  11588       9906     PO Virgin   F         A2     A2
# 8   9906      11588     PO Virgin   M         A2     A2
# 9  11039       9717     PO Virgin   F         A3     A3
# 10  9717      11039     PO Virgin   M         A3     A3
# 11 11539      11713     BW  Mated   F         A1     A2
# 12 11713      11539     BW  Mated   M         A1     A2
# 13 11489      11862     BW  Mated   F         A2     A1
# 14 11862      11489     BW  Mated   M         A2     A1
# 15 11403      11070     PO Virgin   F         A4     A4
# 16 11070      11403     PO Virgin   M         A4     A4
# 17 11271       9225     PO  Mated   F         A1     A1
# 18  9225      11271     PO  Mated   M         A1     A1
# 19 11039       9717     PO  Mated   F         A2     A3
# 20  9717      11039     PO  Mated   M         A2     A3

答案 1 :(得分:0)

这是在dplyr和自定义函数中完成我想做的事情的一种方式。这不是那么优雅,但至少更容易理解:

library(dplyr)

# This will give you 260 unique factors
facSet <- paste0(rep(LETTERS,each = 10),rep(1:10,times = 10))

getFactor <- function(subsetDF) {
    key <- 1
    subsetDF$Factor <- NA
    for (i in 1:nrow(subsetDF)) {
        if (subsetDF$sex[i] == "male") {
            subsetDF$Factor[i] <- facSet[key]
            key <- key + 1
        }
    }
    for (i in 1:nrow(subsetDF)) {
        if (subsetDF$sex[i] == "female") {
            subsetDF$Factor[i] <- unique(subsetDF$Factor[which(subsetDF$partner[i] == subsetDF$ID)])[1]
        }
    }
    return(subsetDF$Factor)
}

df <- df %>% group_by(species) %>% mutate(Factor = getFactor(data.frame(ID,sex,partner)))

输出:

> df
# A tibble: 16 x 6
# Groups:   species [2]
      ID partner stage    sex    species Factor
   <int>   <int> <fct>    <fct>  <fct>   <chr> 
 1     1       4 juvenile male   a       A1    
 2     2       3 juvenile male   a       A2    
 3     3       2 juvenile female a       A2    
 4     4       1 juvenile female a       A1    
 5     5       8 juvenile male   b       A1    
 6     6       7 juvenile male   b       A2    
 7     7       6 juvenile female b       A2    
 8     8       5 juvenile female b       A1    
 9     1       4 adult    male   a       A3    
10     2       3 adult    male   a       A4    
11     3       2 adult    female a       A2    
12     4       1 adult    female a       A1    
13     5       8 adult    male   b       A3    
14     6       7 adult    male   b       A4    
15     7       6 adult    female b       A2    
16     8       5 adult    female b       A1 

注意:如果您需要超过260个唯一的男女对,请创建更大的 facSet