我有两个数据框:
#df1
df1 = data.frame(id = c("A","B","C","D","E"),
dev = c(213.5, 225.1, 198.9, 201.0, 266.8))
df1
id dev
1 A 213.5
2 B 225.1
3 C 198.9
4 D 201.0
5 E 266.8
#df2
df2 = data.frame(DateTime = seq(
from = as.POSIXct("1986-1-1 0:00"),
to = as.POSIXct("1986-1-2 23:00"),
by = "hour"),
cum_dd = seq(from = 185, to = 295, by = 2.3))
head(df2)
DateTime cum_dd
1 1986-01-01 00:00:00 185.0
2 1986-01-01 01:00:00 187.3
3 1986-01-01 02:00:00 189.6
4 1986-01-01 03:00:00 191.9
5 1986-01-01 04:00:00 194.2
6 1986-01-01 05:00:00 196.5
我想在df1中添加一个新列,列出最早的df2 $ DateTime,其中df2 $ cum_dd超过df1 $ dev。
这是我想要的结果:
id dev desired
1 A 213.5 1986-01-01 13:00:00
2 B 225.1 1986-01-01 18:00:00
3 C 198.9 1986-01-01 07:00:00
4 D 201.0 1986-01-01 07:00:00
5 E 266.8 1986-01-02 12:00:00
我熟悉dplyr中的min(which())函数,该函数的格式如下时,返回df2中的第一个行号,其中cum_dd大于200:
library(dplyr)
min(which (df2$cum_dd > 200))
实际上,我想为df1中的每一行运行此功能(用df1 $ dev代替“ 200”),并查找/提取相应的df2 $ DateTime值而不是行号。
我以为我已经接近了,但是还不完全正确,我在Stack Overflow中找不到类似的问题:
desired <- apply(df1, 1,
function (x) {ddply(df2, .(DateTime), summarize,
min(which (df2$cum_dd > df1$dev)))})
非常感谢您提出解决方案!
答案 0 :(得分:3)
# example datasets
df1 = data.frame(id = c("A","B","C","D","E"),
dev = c(213.5, 225.1, 198.9, 201.0, 266.8))
df2 = data.frame(DateTime = seq(
from = as.POSIXct("1986-1-1 0:00"),
to = as.POSIXct("1986-1-2 23:00"),
by = "hour"),
cum_dd = seq(from = 185, to = 295, by = 2.3))
library(tidyverse)
df1 %>%
crossing(df2) %>% # get all combinations of rows
group_by(id, dev) %>% # for each id and dev
summarise(desired = min(DateTime[cum_dd > dev])) %>% # get minimum date when cumm_dd exeeds dev
ungroup() # forget the grouping
# # A tibble: 5 x 3
# id dev desired
# <fct> <dbl> <dttm>
# 1 A 214. 1986-01-01 13:00:00
# 2 B 225. 1986-01-01 18:00:00
# 3 C 199. 1986-01-01 07:00:00
# 4 D 201 1986-01-01 07:00:00
# 5 E 267. 1986-01-02 12:00:00
答案 1 :(得分:0)
library(tidyverse)
df1 = data.frame("id" = c("A","B","C","D","E"), "dev" = c(213.5, 225.1, 198.9, 201.0, 266.8))
df2 = data.frame("DateTime" = seq(
from = as.POSIXct("1986-1-1 0:00"),
to = as.POSIXct("1986-1-2 23:00"),
by = "hour"),
"cum_dd" = seq(from = 185, to = 295, by = 2.3))
df2 %>%
crossing(df1) %>%
filter(cum_dd > dev) %>%
arrange(DateTime, desc(cum_dd)) %>%
group_by(id) %>%
distinct(id, .keep_all = T)