基于有序多因子列拆分数据框

时间:2016-10-31 16:15:10

标签: r split subsequence

我想在数据帧列表中拆分数据帧。拆分它的原因是我们总是father后跟mother,后面跟offspringfather。但是,这些系列成员可能有多行(总是后续行。例如{1}}数字1在行1和行2中。在我的下面的例子中,我有两个家庭,然后我试图获得一个包含两个数据帧的列表。

我的意见:

df <- 'Chr  Start   End Family
1   187546286   187552094   father
3   108028534   108032021   father
1   4864403 4878685 mother
1   18898657    18904908    mother
2   460238  461771  offspring
3   108028534   108032021   offspring
1   71481449    71532983    father
2   74507242    74511395    father
2   181864092   181864690   mother
1   71481449    71532983    offspring
2   181864092   181864690   offspring
3   160057791   160113642   offspring'

df <- read.table(text=df, header=T)

因此,我的预期输出dfout[[1]]将如下所示:

dfout <- 'Chr   Start   End Family
1   187546286   187552094   father
3   108028534   108032021   father
1   4864403 4878685 mother
1   18898657    18904908    mother
2   460238  461771  offspring
3   108028534   108032021   offspring'

dfout - read.table(text=dfout, header=TRUE)

2 个答案:

答案 0 :(得分:1)

要将每个系列拆分为单独的数据框,您需要一个索引,指示一个系列的结束位置和另一个系列的开始位置。对于索引,我使用&#34;父亲&#34;作为变革点。但是我们不能简单地使用indx <- df$Family == "father",因为可能有多个父亲&#39;}连续的条目。相反,我们测试来自后代的切换位置&#39;到了父亲那里通过搜索它等于1的位置。

indx <- cumsum(c(1L, diff(df$Family == "father")) == 1L)
split(df, indx)
# $`1`
#   Chr     Start       End    Family
# 1   1 187546286 187552094    father
# 2   3 108028534 108032021    father
# 3   1   4864403   4878685    mother
# 4   1  18898657  18904908    mother
# 5   2    460238    461771 offspring
# 6   3 108028534 108032021 offspring
# 
# $`2`
#    Chr     Start       End    Family
# 7    1  71481449  71532983    father
# 8    2  74507242  74511395    father
# 9    2 181864092 181864690    mother
# 10   1  71481449  71532983 offspring
# 11   2 181864092 181864690 offspring
# 12   3 160057791 160113642 offspring

答案 1 :(得分:0)

如果您发布了用于生成实际数据框的代码,那将会更有帮助。我没有时间重做所有内容,但我会在一般意义上向您展示它是如何工作的。

gender <- c("M","M","F","F","F","F","M","M","M","M","F","F")
values <- c(20,22,24,19,9,17,18,22,12,14,7,8)
fruit <- c("apple","pear","mango","mango","mango","apple","banana","banana","banana","mango","apple","apple")
df <- data.frame(gender, values, fruit)


> df
   gender values  fruit
1       M     20  apple
2       M     22   pear
3       F     24  mango
4       F     19  mango
5       F      9  mango
6       F     17  apple
7       M     18 banana
8       M     22 banana
9       M     12 banana
10      M     14  mango
11      F      7  apple
12      F      8  apple

split(df, df$gender)

$F
   gender values fruit
3       F     24 mango
4       F     19 mango
5       F      9 mango
6       F     17 apple
11      F      7 apple
12      F      8 apple

$M
   gender values  fruit
1       M     20  apple
2       M     22   pear
7       M     18 banana
8       M     22 banana
9       M     12 banana
10      M     14  mango