Question

我想在数据帧列表中拆分数据帧。拆分它的原因是我们总是father后跟mother，后面跟offspring。father。但是，这些系列成员可能有多行（总是后续行。例如{1}}数字1在行1和行2中。在我的下面的例子中，我有两个家庭，然后我试图获得一个包含两个数据帧的列表。

我的意见：

df <- 'Chr  Start   End Family
1   187546286   187552094   father
3   108028534   108032021   father
1   4864403 4878685 mother
1   18898657    18904908    mother
2   460238  461771  offspring
3   108028534   108032021   offspring
1   71481449    71532983    father
2   74507242    74511395    father
2   181864092   181864690   mother
1   71481449    71532983    offspring
2   181864092   181864690   offspring
3   160057791   160113642   offspring'

df <- read.table(text=df, header=T)

因此，我的预期输出dfout[[1]]将如下所示：

dfout <- 'Chr   Start   End Family
1   187546286   187552094   father
3   108028534   108032021   father
1   4864403 4878685 mother
1   18898657    18904908    mother
2   460238  461771  offspring
3   108028534   108032021   offspring'

dfout - read.table(text=dfout, header=TRUE)

Answer 1

要将每个系列拆分为单独的数据框，您需要一个索引，指示一个系列的结束位置和另一个系列的开始位置。对于索引，我使用＆＃34;父亲＆＃34;作为变革点。但是我们不能简单地使用indx <- df$Family == "father"，因为可能有多个父亲＆＃39;}连续的条目。相反，我们测试来自后代的切换位置＆＃39;到了父亲那里通过搜索它等于1的位置。

indx <- cumsum(c(1L, diff(df$Family == "father")) == 1L)
split(df, indx)
# $`1`
#   Chr     Start       End    Family
# 1   1 187546286 187552094    father
# 2   3 108028534 108032021    father
# 3   1   4864403   4878685    mother
# 4   1  18898657  18904908    mother
# 5   2    460238    461771 offspring
# 6   3 108028534 108032021 offspring
# 
# $`2`
#    Chr     Start       End    Family
# 7    1  71481449  71532983    father
# 8    2  74507242  74511395    father
# 9    2 181864092 181864690    mother
# 10   1  71481449  71532983 offspring
# 11   2 181864092 181864690 offspring
# 12   3 160057791 160113642 offspring

Answer 2

如果您发布了用于生成实际数据框的代码，那将会更有帮助。我没有时间重做所有内容，但我会在一般意义上向您展示它是如何工作的。

gender <- c("M","M","F","F","F","F","M","M","M","M","F","F")
values <- c(20,22,24,19,9,17,18,22,12,14,7,8)
fruit <- c("apple","pear","mango","mango","mango","apple","banana","banana","banana","mango","apple","apple")
df <- data.frame(gender, values, fruit)


> df
   gender values  fruit
1       M     20  apple
2       M     22   pear
3       F     24  mango
4       F     19  mango
5       F      9  mango
6       F     17  apple
7       M     18 banana
8       M     22 banana
9       M     12 banana
10      M     14  mango
11      F      7  apple
12      F      8  apple

split(df, df$gender)

$F
   gender values fruit
3       F     24 mango
4       F     19 mango
5       F      9 mango
6       F     17 apple
11      F      7 apple
12      F      8 apple

$M
   gender values  fruit
1       M     20  apple
2       M     22   pear
7       M     18 banana
8       M     22 banana
9       M     12 banana
10      M     14  mango

基于有序多因子列拆分数据框

2 个答案: