我正在处理与数据帧相关的问题,并根据匹配条件的索引检索特定行
# Create dataframe
position <- c("START" , "MIDDLE", "END" ,"START" , "MIDDLE",
"MIDDLE", "MIDDLE", "MIDDLE" ,"MIDDLE" ,"MIDDLE",
"MIDDLE", "MIDDLE", "MIDDLE" ,"END", "START" ,
"START" , "START" , "MIDDLE", "MIDDLE", "END",
"START" , "START", "MIDDLE", "MIDDLE", "MIDDLE",
"END" ,"START", "MIDDLE", "MIDDLE", "MIDDLE",
"END", "START" , "MIDDLE", "MIDDLE", "MIDDLE",
"MIDDLE" ,"MIDDLE" ,"MIDDLE", "MIDDLE" ,"MIDDLE" ,
"MIDDLE" ,"MIDDLE", "MIDDLE", "MIDDLE", "MIDDLE",
"MIDDLE" ,"MIDDLE", "MIDDLE" ,"MIDDLE" ,"MIDDLE" ,
"MIDDLE", "MIDDLE", "MIDDLE", "END")
text <-c("First line", "Middle Line", "Last Line", "First line","Middle Line",
"Middle Line", "Middle Line", "Middle Line", "Middle Line", "Middle Line",
"Middle Line", "Middle Line", "Middle Line", "Last Line", "First line",
"First line", "First line", "Middle Line", "Middle Line", "Last Line",
"First line", "First line", "Middle Line", "Middle Line", "Middle Line",
"Last Line", "First line", "Middle Line", "Middle Line", "Middle Line",
"Last Line", "First line", "Middle Line", "Middle Line", "Middle Line",
"Middle Line", "Middle Line", "Middle Line", "Middle Line", "Middle Line",
"Middle Line", "Middle Line", "Middle Line", "Middle Line", "Middle Line",
"Middle Line", "Middle Line", "Middle Line", "Middle Line", "Middle Line",
"Middle Line", "Middle Line", "Middle Line", "Last Line")
哪个基本要素显示如下行:
> head(a_df)
position text
1 START First line
2 MIDDLE Middle Line
3 END Last Line
基本上我希望能够显示整个数据帧的子集,每个子集应包含开始/中间和结束行。
在网上做一些阅读我试图按如下方式生成索引:
# Generate indices
index_start <- with(a_df, grep("START", a_df$position))
index_end <- with(a_df, grep("END", a_df$position))
提供所需的输出:
index_start
[1] 1 4 15 16 17 21 22 27 32
> index_end
[1] 3 14 20 26 31 54
我意识到索引是不平衡的(我正在消除这些不平衡)但我想知道如何使用上面的输出来在以下子集命令中播种值:
a_df[c(1:3),]
a_df[c(4:14),]
a_df[c(17:20),]
a_df[c(22:26),]
a_df[c(27:31),]
a_df[c(32:54),]
提前致谢 乔纳森
答案 0 :(得分:2)
在序列中选择'index_start'的元素并不清楚,但是根据OP的帖子中显示的代码,似乎我们需要得到'index_start'的最后一个元素,它小于元素'index_end'。为了获得最后一个元素,我们使用findInterval
创建一个分组变量并使用tapply
,使用tail
然后,我们得到'index_start1','index_end'的相应元素之间的序列,并基于它与Map
对数据集行进行子集,得到list
data.frame
s。
index_start1 <- unname(tapply(index_start, findInterval(index_start, index_end),
FUN = tail, 1))
index_start1
#[1] 1 4 17 22 27 32
lst <- Map(function(x, y) a_df[x:y,], index_start1, index_end)
lst
#[[1]]
# position text
#1 START First line
#2 MIDDLE Middle Line
#3 END Last Line
#[[2]]
# position text
#4 START First line
#5 MIDDLE Middle Line
#6 MIDDLE Middle Line
#7 MIDDLE Middle Line
#8 MIDDLE Middle Line
#9 MIDDLE Middle Line
#10 MIDDLE Middle Line
#11 MIDDLE Middle Line
#12 MIDDLE Middle Line
#13 MIDDLE Middle Line
#14 END Last Line
#[[3]]
# position text
#17 START First line
#18 MIDDLE Middle Line
#19 MIDDLE Middle Line
#20 END Last Line
#[[4]]
# position text
#22 START First line
#23 MIDDLE Middle Line
#24 MIDDLE Middle Line
#25 MIDDLE Middle Line
#26 END Last Line
#[[5]]
# position text
#27 START First line
#28 MIDDLE Middle Line
#29 MIDDLE Middle Line
#30 MIDDLE Middle Line
#31 END Last Line
#[[6]]
# position text
#32 START First line
#33 MIDDLE Middle Line
#34 MIDDLE Middle Line
#35 MIDDLE Middle Line
#36 MIDDLE Middle Line
#37 MIDDLE Middle Line
#38 MIDDLE Middle Line
#39 MIDDLE Middle Line
#40 MIDDLE Middle Line
#41 MIDDLE Middle Line
#42 MIDDLE Middle Line
#43 MIDDLE Middle Line
#44 MIDDLE Middle Line
#45 MIDDLE Middle Line
#46 MIDDLE Middle Line
#47 MIDDLE Middle Line
#48 MIDDLE Middle Line
#49 MIDDLE Middle Line
#50 MIDDLE Middle Line
#51 MIDDLE Middle Line
#52 MIDDLE Middle Line
#53 MIDDLE Middle Line
#54 END Last Line
注意:最好将'data.frame'保留在list
中,因为大多数操作都可以在list
环境中完成。