我有两个数据框,一个包含度量
sensor_id | 价值 |
---|---|
AAA | 5 |
AAA | 7 |
BBB | 9 |
BBB | 10 |
另一个包含相应级别的:
created_at | level_name | 来自 | 到 |
---|---|---|---|
2021-04-01 | 级别 1 | 0 | 5 |
2021-04-01 | 2 级 | 5 | 7 |
2021-04-01 | 级别 3 | 7 | 15 |
2020-12-15 | 级别 1 | 0 | 4 |
2020-12-15 | 2 级 | 4 | 8 |
2020-12-15 | 级别 3 | 8 | 15 |
对于每次测量,我想根据值从第一个数据帧分配相应的 level_name。由于级别可以更改,我只需要考虑最新的值。
我试过了:
df_values <- read.table(text = "sensor_id value
AAA 5
AAA 7
BBB 9
BBB 10
",header = TRUE, stringsAsFactors = FALSE)
df_levels <- read.table(text =" created_at level_name from to
1 2021-04-01 Level_1 0 5
2 2021-04-01 Level_2 5 7
3 2021-04-01 Level_3 7 15
4 2020-12-15 Level_1 0 4
5 2020-12-15 Level_2 4 8
6 2020-12-15 Level_3 8 15
",header = TRUE, stringsAsFactors = FALSE)
df_values$level_name<- df_levels %>%
filter(df_values[value] >= from ,df_values[value] < to ) %>%
arrange(desc(created_at)) %>%
head(1) %>%
select(level_name)
答案 0 :(得分:0)
你的问题有点不清楚,多亏了你的代码,我认为你的意思是“最新”日期的第一级。因为它们在日期上都是一样的。
newest <- df_levels %>%
select(created_at,level_name) %>%
filter(created_at==max(created_at)) %>%
head(1) %>%
select(level_name) %>%
pull
df_values %>%
left_join(df_levels %>% select(level_name,to),by=c('value'='to')) %>%
mutate(level_name=ifelse(is.na(level_name),newest,level_name))
答案 1 :(得分:0)
这是一个将 fuzzyjoin
与 tidyverse
结合使用的选项。
首先,将您的 df_levels
data.frame 限制为每个级别的最新行。我不确定您拥有多少数据,但在此步骤进行限制可能会有所帮助,因为您只对最新值感兴趣。
使用 fuzzy_left_join
,您可以根据值将两个表连接在一起。值得注意的是,您可能需要使用 >
和/或 <=
进行修改以处理边缘情况。
library(fuzzyjoin)
library(tidyverse)
df_levels_new <- df_levels %>%
mutate(created_at = as.Date(created_at, format = "%Y-%m-%d")) %>%
group_by(level_name) %>%
arrange(created_at) %>%
slice(1)
fuzzy_left_join(df_values,
df_levels_new,
by = c("value" = "from", "value" = "to"),
match_fun = c(`>`, `<=`))
输出
sensor_id value created_at level_name from to
1 AAA 5 2020-12-15 Level_2 4 8
2 AAA 7 2020-12-15 Level_2 4 8
3 BBB 9 2020-12-15 Level_3 8 15
4 BBB 10 2020-12-15 Level_3 8 15
答案 2 :(得分:0)
我不确定这是否是您正在寻找的输出,但请告诉我我可以通过什么方式改进它:
library(dplyr)
library(purrr)
library(tidyr)
library(lubridate)
df_levels %>%
mutate(match = map2(from, to, ~ df_values$value[between(df_values$value, .x, .y - 1)])) %>%
unnest_longer(match) %>%
drop_na() %>%
left_join(df_values, by = c("match" = "value")) %>%
mutate(created_at = as_date(created_at)) %>%
arrange(desc(created_at)) %>%
group_by(sensor_id, match) %>%
slice_head(n = 1)
# A tibble: 4 x 6
# Groups: sensor_id, match [4]
created_at level_name from to match sensor_id
<date> <chr> <int> <int> <int> <chr>
1 2021-04-01 Level_2 5 7 5 AAA
2 2021-04-01 Level_3 7 15 7 AAA
3 2021-04-01 Level_3 7 15 9 BBB
4 2021-04-01 Level_3 7 15 10 BBB