假设我有一个数据@Bean
public Jackson2ObjectMapperBuilder configureObjectMapper() {
Jackson2ObjectMapperBuilder builder = new Jackson2ObjectMapperBuilder();
builder.modules(new JavaTimeModule(),new ParameterNamesModule(),new Jdk8Module());
builder.featuresToDisable(SerializationFeature.WRITE_DATES_AS_TIMESTAMPS);
return builder;
}
,如下所示。总共有20行,df
列中有四种类型的字符串:“ A”,“ B”,“ C”和“ D”。
string
通过从上一行中减去列no string position
1 B 650
2 C 651
3 B 659
4 C 660
5 C 662
6 B 663
7 D 668
8 D 670
9 C 671
10 B 672
11 C 673
12 A 681
13 C 682
14 B 683
15 C 684
16 D 690
17 A 692
18 C 693
19 D 694
20 C 695
中的值,我可以通过执行以下命令来获得列position
的第四列:
distance
这样我就可以得到当前值到上一行的距离,如下所示:
df$distance <- ave(df$position, FUN=function(x) c(0, diff(x)))
但是,我希望得到的是每个字符串在no string position distance
1 B 650 0
2 C 651 1
3 B 659 8
4 C 660 1
5 C 662 2
6 B 663 1
7 D 668 5
8 D 670 2
9 C 671 1
10 B 672 1
11 C 673 1
12 A 681 8
13 C 682 1
14 B 683 1
15 C 684 1
16 D 690 6
17 A 692 2
18 C 693 1
19 D 694 1
20 C 695 1
中到最接近的先前字符串“ C”的距离,例如更改 7,8和17 下方:
column position
我该怎么做?顺便说一句,我是否也知道如何获得距no string position distance
1 B 650 0
2 C 651 1
3 B 659 8
4 C 660 1
5 C 662 2
6 B 663 1
7 D 668 6
8 D 670 8
9 C 671 1
10 B 672 1
11 C 673 1
12 A 681 8
13 C 682 1
14 B 683 1
15 C 684 1
16 D 690 6
17 A 692 8
18 C 693 1
19 D 694 1
20 C 695 1
列中的最近的下一个“ C” 的距离?
答案 0 :(得分:0)
可能不是理想的解决方案,有一种方法可以简化此过程。
#Taken from your code
df$distance <- ave(df$position, FUN=function(x) c(0, diff(x)))
#logical values indicating occurrence of "C"
c_occur = df$string == "C"
#We can ignore first two values in each group since,
#First value is "C" and second value is correctly calculated from previous row
#Get the indices where we need to replace the values
inds_to_replace = which(ave(df$string, cumsum(c_occur), FUN = seq_along) > 2)
#Get the closest occurrence of "C" from the inds_to_replace
c_to_replace <- sapply(inds_to_replace, function(x) {
new_inds <- which(c_occur)
max(new_inds[(x - new_inds) > 0])
#To get distance from "nearest next "C" replace the above line with
#new_inds[which.max(x - new_inds < 0)]
})
#Replace the values
df$distance[inds_to_replace] <- df$position[inds_to_replace] -
df$position[c_to_replace]
df[inds_to_replace, ]
# no string position distance
#7 7 D 668 6
#8 8 D 670 8
#17 17 A 692 8
答案 1 :(得分:0)
这是一种data.table
的方式:
dtt[, distance := c(0, diff(position))]
dtt[cumsum(string == 'C') > 0,
distance := ifelse(seq_len(.N) == 1, distance, position - position[1]),
by = cumsum(string == 'C')]
# no string position distance
# 1: 1 B 650 0
# 2: 2 C 651 1
# 3: 3 B 659 8
# 4: 4 C 660 1
# 5: 5 C 662 2
# 6: 6 B 663 1
# 7: 7 D 668 6
# 8: 8 D 670 8
# 9: 9 C 671 1
# 10: 10 B 672 1
# 11: 11 C 673 1
# 12: 12 A 681 8
# 13: 13 C 682 1
# 14: 14 B 683 1
# 15: 15 C 684 1
# 16: 16 D 690 6
# 17: 17 A 692 8
# 18: 18 C 693 1
# 19: 19 D 694 1
# 20: 20 C 695 1
这里是dtt
:
structure(list(no = 1:20, string = c("B", "C", "B", "C", "C",
"B", "D", "D", "C", "B", "C", "A", "C", "B", "C", "D", "A", "C",
"D", "C"), position = c(650L, 651L, 659L, 660L, 662L, 663L, 668L,
670L, 671L, 672L, 673L, 681L, 682L, 683L, 684L, 690L, 692L, 693L,
694L, 695L)), row.names = c(NA, -20L), class = c("data.table",
"data.frame"), .internal.selfref = <pointer: 0x1939260>)
如果要获取非C
行的最接近的下一个C
的距离,请尝试以下操作:
dtt[, distance := c(0, diff(position))]
dtt[, g := rev(cumsum(rev(string == 'C')))]
dtt[g > 0, distance := ifelse(seq_len(.N) == .N, distance, abs(position - position[.N])), by = g]
dtt[, g := NULL]
# no string position distance
# 1: 1 B 650 1
# 2: 2 C 651 1
# 3: 3 B 659 1
# 4: 4 C 660 1
# 5: 5 C 662 2
# 6: 6 B 663 8
# 7: 7 D 668 3
# 8: 8 D 670 1
# 9: 9 C 671 1
# 10: 10 B 672 1
# 11: 11 C 673 1
# 12: 12 A 681 1
# 13: 13 C 682 1
# 14: 14 B 683 1
# 15: 15 C 684 1
# 16: 16 D 690 3
# 17: 17 A 692 1
# 18: 18 C 693 1
# 19: 19 D 694 1
# 20: 20 C 695 1
答案 2 :(得分:0)
以下tidyverse
方法可重现您的预期输出。
问题描述:计算当前行与{em>上一个 position
行在string = "C"
中的差异;如果没有上一行string = "C"
,或者该行本身有string = "C"
,则该距离由当前行和上一行之间的position
的差值给出(与string
无关) )。
library(tidyverse)
df %>%
mutate(nC = cumsum(string == "C")) %>%
group_by(nC) %>%
mutate(dist = cumsum(c(0, diff(position)))) %>%
ungroup() %>%
mutate(dist = if_else(dist == 0, c(0, diff(position)), dist)) %>%
select(-nC)
## A tibble: 20 x 4
# no string position dist
# <int> <fct> <int> <dbl>
# 1 1 B 650 0.
# 2 2 C 651 1.
# 3 3 B 659 8.
# 4 4 C 660 1.
# 5 5 C 662 2.
# 6 6 B 663 1.
# 7 7 D 668 6.
# 8 8 D 670 8.
# 9 9 C 671 1.
#10 10 B 672 1.
#11 11 C 673 1.
#12 12 A 681 8.
#13 13 C 682 1.
#14 14 B 683 1.
#15 15 C 684 1.
#16 16 D 690 6.
#17 17 A 692 8.
#18 18 C 693 1.
#19 19 D 694 1.
#20 20 C 695 1.
df <- read.table(text =
"no string position
1 B 650
2 C 651
3 B 659
4 C 660
5 C 662
6 B 663
7 D 668
8 D 670
9 C 671
10 B 672
11 C 673
12 A 681
13 C 682
14 B 683
15 C 684
16 D 690
17 A 692
18 C 693
19 D 694
20 C 695", header = T)