我正在尝试计算单条线上重叠线段的总和。在A行中,线段是不相交的,因此计算起来非常简单。但是,对于线B和C,存在重叠的线段,因此更加复杂。我需要以某种方式排除前面几行中已经占总和的部分。
data = read.table(text="
line left_line right_line small_line left_small_line right_small_line
A 100 120 101 91 111
A 100 120 129 119 139
B 70 90 63 53 73
B 70 90 70 60 80
B 70 90 75 65 85
C 20 40 11 1 21
C 20 40 34 24 44
C 20 40 45 35 55", header=TRUE)
这应该是预期的结果。
result = read.table(text="
total_overlapping
A 0.6
B 0.75
C 0.85", header=TRUE)
编辑:添加了一张图片以更好地说明我正在尝试找出的内容。有3张不同的线(实线),线段(虚线)重叠。目的是找出有多少虚线覆盖/重叠。
答案 0 :(得分:1)
如果我理解正确,这里的small_line
变量就无关紧要了。其余各列可用于获取重叠段的总和:
步骤1 。获取每个线段与相应线重叠的起点和终点:
library(dplyr)
data1 <- data %>%
rowwise() %>%
mutate(overlap.start = max(left_line, left_small_line),
overlap.end = min(right_line, right_small_line)) %>%
ungroup() %>%
select(line, overlap.start, overlap.end)
> data1
# A tibble: 8 x 3
line overlap.start overlap.end
<fct> <int> <int>
1 A 100 111
2 A 119 120
3 B 70 73
4 B 70 80
5 B 70 85
6 C 20 21
7 C 24 40
8 C 35 40
第2步。在每行对应的行中,按顺序对重叠进行排序。如果它是第一个重叠,或者先前的重叠在开始之前结束,则将其视为新的重叠部分。标记每个新的重叠部分:
data2 <- data1 %>%
arrange(line, overlap.start, overlap.end) %>%
group_by(line) %>%
mutate(new.section = is.na(lag(overlap.end)) |
lag(overlap.end) <= overlap.start) %>%
mutate(section.number = cumsum(new.section)) %>%
ungroup()
> data2
# A tibble: 8 x 5
line overlap.start overlap.end new.section section.number
<fct> <int> <int> <lgl> <int>
1 A 100 111 TRUE 1
2 A 119 120 TRUE 2
3 B 70 73 TRUE 1
4 B 70 80 FALSE 1
5 B 70 85 FALSE 1
6 C 20 21 TRUE 1
7 C 24 40 TRUE 2
8 C 35 40 FALSE 2
第3步。在每个重叠的部分中,以最早的起点和最新的终点。计算每个重叠的长度:
data3 <- data2 %>%
group_by(line, section.number) %>%
summarise(overlap.start = min(overlap.start),
overlap.end = max(overlap.end)) %>%
ungroup() %>%
mutate(overlap = overlap.end - overlap.start)
> data3
# A tibble: 5 x 5
line section.number overlap.start overlap.end overlap
<fct> <int> <dbl> <dbl> <dbl>
1 A 1 100 111 11
2 A 2 119 120 1
3 B 1 70 85 15
4 C 1 20 21 1
5 C 2 24 40 16
步骤4 。对每行的重叠长度求和:
data4 <- data3 %>%
group_by(line) %>%
summarise(overlap = sum(overlap)) %>%
ungroup()
> data4
# A tibble: 3 x 2
line overlap
<fct> <dbl>
1 A 12
2 B 15
3 C 17
现在,您的预期结果显示每行重叠的预期百分比,而不是总和。如果您要查找的是,可以将每行的长度添加到data4
,并据此进行计算:
data5 <- data4 %>%
left_join(data %>%
select(line, left_line, right_line) %>%
unique() %>%
mutate(length = right_line - left_line) %>%
select(line, length),
by = "line") %>%
mutate(overlap.percentage = overlap / length)
> data5
# A tibble: 3 x 4
line overlap length overlap.percentage
<fct> <dbl> <int> <dbl>
1 A 12 20 0.6
2 B 15 20 0.75
3 C 17 20 0.85