Question

我不熟悉编程，我一直在努力获得我想要的输出，如下所述。

假设我有一张如下表所示的表格： My input

包括特定段（由ID定义）的坐标范围（Start_MP＆amp; End_MP）和段的长度（范围开始和结束之间的差异）。我需要做的是，将所有长度超过2的范围分成两个或更少的范围。为了更清楚，我需要输出如下表所示 My desired output

如果您让我知道如何处理R / R包，我将不胜感激？

Answer 1

函数tidyr::expand是用于根据OP的选择/期望扩展行的正确选项。

方法是首先使用expand生成所需数量的行，然后使用left_join加入原始data.frame。

# Data
df <- data.frame(Segment_ID = c(1101, 1102, 1103), Start_MP = c(1, 5, 20),
                  End_MP = c(2, 10, 30), Segment_Length = c(1, 5, 10))

library(tidyverse)

df %>% group_by(Segment_ID) %>% 
  expand(Segment_ID, Segment_Sequence_Number = 
                seq(from = Start_MP, to = End_MP, by = 2)) %>%
  left_join(df, by="Segment_ID") %>%
  mutate(Start_MP = Segment_Sequence_Number) %>%
  group_by(Segment_ID) %>%
  mutate(End_MP_Calc = lead(Start_MP)) %>%
  mutate(End_MP = coalesce(End_MP_Calc, End_MP)) %>% 
  filter(Start_MP != End_MP) %>%
  mutate(Segment_Length = End_MP - Start_MP) %>%
  group_by(Segment_ID) %>%
  mutate(Segment_Sequence_Number = row_number()) %>%
  select(-End_MP_Calc) %>% as.data.frame()

#Result
#   Segment_ID Segment_Sequence_Number Start_MP End_MP Segment_Length
# 1       1101                       1        1      2              1
# 2       1102                       1        5      7              2
# 3       1102                       2        7      9              2
# 4       1102                       3        9     10              1
# 5       1103                       1       20     22              2
# 6       1103                       2       22     24              2
# 7       1103                       3       24     26              2
# 8       1103                       4       26     28              2
# 9       1103                       5       28     30              2

将行中提供的范围拆分为多行中的几个较小范围

1 个答案: