使用df2日期时间中df1的“小时”和“分钟”条件合并2个数据帧

时间:2019-05-23 18:38:54

标签: r dataframe dplyr data.table non-equi-join

我有一个像这样的数据框df.sample

id <- c("A","A","A","A","A","A","A","A","A","A","A")
date <- c("2018-11-12","2018-11-12","2018-11-12","2018-11-12","2018-11-12",
          "2018-11-12","2018-11-12","2018-11-14","2018-11-14","2018-11-14",
          "2018-11-12")
hour <- c(8,8,9,9,13,13,16,6,7,19,7)
min <- c(47,59,6,18,22,36,12,32,12,21,47)
value <- c(70,70,86,86,86,74,81,77,79,83,91)
df.sample <- data.frame(id,date,hour,min,value,stringsAsFactors = F) 
df.sample$date <- as.Date(df.sample$date,format="%Y-%m-%d")

我还有另一个数据框df.state

id <- c("A","A","A")
starttime <- c("2018-11-12 08:59:00","2018-11-14 06:24:17","2018-11-15 09:17:00")
endtime <- c("2018-11-12 15:57:00","2018-11-14 17:22:16","2018-11-15 12:17:32")
state <- c("Pass","Pass","Pass")

df.state <- data.frame(id,starttime,endtime,state,stringsAsFactors = F) 
df.state$starttime <- as.POSIXct(df.state$starttime,format="%Y-%m-%d %H:%M:%S")
df.state$endtime <- as.POSIXct(df.state$endtime,format="%Y-%m-%d %H:%M:%S")

我正在尝试根据条件合并这两个数据帧

如果hour中的mindf.samplestarttime的{​​{1}}和endtime内,则合并df.statestate = Pass中。

例如,df.sample中的行2具有df.samplehour = 8,并且由于它位于min = 59中的starttime = 2018-11-12 08:59:00之内,因此值{{1 }}已添加

这是我的所需输出

df.state

我能够像这样合并这两个数据帧,但无法在df.state的开始时间和结束时间中查找df.sample的小时和分钟

Pass

有人可以指出我正确的方向

4 个答案:

答案 0 :(得分:4)

如果碰巧有大数据框架,则使用data.table软件包中的非等额联接会更快,更容易: Benchmark | Video

library(data.table)

## convert both data.frames to data.tables by reference
setDT(df.sample)
setDT(df.state) 

## create a `time` column in df.sample 
df.sample[, time := as.POSIXct(paste0(date, " ", hour, ":", min, ":00"))]
## change column order
setcolorder(df.sample, c("id", "time"))

# join by id and time within start & end time limits
# "x." is used so we can refer to the column in other data.table explicitly
df.state[df.sample, .(id, time, date, hour, min, value, state = x.state), 
         on = .(id, starttime <= time, endtime >= time)]
#>     id                time       date hour min value state
#>  1:  A 2018-11-12 08:47:00 2018-11-12    8  47    70  <NA>
#>  2:  A 2018-11-12 08:59:00 2018-11-12    8  59    70  Pass
#>  3:  A 2018-11-12 09:06:00 2018-11-12    9   6    86  Pass
#>  4:  A 2018-11-12 09:18:00 2018-11-12    9  18    86  Pass
#>  5:  A 2018-11-12 13:22:00 2018-11-12   13  22    86  Pass
#>  6:  A 2018-11-12 13:36:00 2018-11-12   13  36    74  Pass
#>  7:  A 2018-11-12 16:12:00 2018-11-12   16  12    81  <NA>
#>  8:  A 2018-11-14 06:32:00 2018-11-14    6  32    77  Pass
#>  9:  A 2018-11-14 07:12:00 2018-11-14    7  12    79  Pass
#> 10:  A 2018-11-14 19:21:00 2018-11-14   19  21    83  <NA>
#> 11:  A 2018-11-12 07:47:00 2018-11-12    7  47    91  <NA>

### remove NA
df.state[df.sample, .(id, time, date, hour, min, value, state = x.state), 
         on = .(id, starttime <= time, endtime >= time), nomatch = 0L]
#>    id                time       date hour min value state
#> 1:  A 2018-11-12 08:59:00 2018-11-12    8  59    70  Pass
#> 2:  A 2018-11-12 09:06:00 2018-11-12    9   6    86  Pass
#> 3:  A 2018-11-12 09:18:00 2018-11-12    9  18    86  Pass
#> 4:  A 2018-11-12 13:22:00 2018-11-12   13  22    86  Pass
#> 5:  A 2018-11-12 13:36:00 2018-11-12   13  36    74  Pass
#> 6:  A 2018-11-14 06:32:00 2018-11-14    6  32    77  Pass
#> 7:  A 2018-11-14 07:12:00 2018-11-14    7  12    79  Pass

reprex package(v0.3.0)于2019-05-23创建

答案 1 :(得分:1)

(重要的备考笔记:import React, {Component} from 'react'; import Slide from 'react-reveal/Slide'; class RedBox extends Component { constructor(props){ super(props); this.handleChange = this.handleChange.bind(this); this.state = { text: props.text } } handleChange(event) { this.setState({text: event.target.value}); } render(){ const { toggleState, text, style} = this.props; return( <div style={style} onClick={()=>{console.log('red clicked'); toggleState({text: this.state.text})}}> <input onChange={this.handleChange} type="text" value={this.state.text} onClick={(event)=>{event.stopPropagation()}} style={{zIndex: '999'}} /> { text } </div> ); } } const BlueBox = ({toggleState, passedProps, style })=> { return ( <div onClick={toggleState} style={style}> { passedProps.text } </div> ); }; class MouseTracker extends React.Component { constructor(props) { super(props); this.handleClick = this.handleClick.bind(this); } handleClick(event) { const coords = { x: event.clientX, y: event.clientY }; this.props.toggleState(coords); } render() { return ( <div style={{ height: '100px' }} onClick={this.handleClick}> <h1>Click me!</h1> </div> ); } } const MouseInformer = ({toggleState, passedProps}) => ( <div> You clicked {passedProps.x}, {passedProps.y}! <button onClick={toggleState}>Go Back</button> </div> ); class SlidePair extends Component { constructor(props){ super(props); this.state = { left: true, passedProps: {}}; this.toggleState = this.toggleState.bind(this); } toggleState(passedProps){ const left = !this.state.left; console.log(`Toggling left to ${left}`); this.setState({ left, passedProps }); } render(){ const {left, passedProps } = this.state; return( <div style={{position: 'relative'}}> <Slide left when={left} > <div style={ {position: 'absolute', top: '0px', right: '0px', width: '100%', zIndex: left ? '998' : -1 }}> {this.props.renderLeft(this.toggleState, passedProps)} </div> </Slide> <Slide right when={!left}> <div style={{position: 'absolute', top: '0px', right: '0px', width: '100%', zIndex: left ? -1 : 1}}> { this.props.renderRight(this.toggleState, passedProps) } </div> </Slide> </div> ) } } class App extends Component { render(){ const redBox = (toggleState, passedProps)=>( <RedBox toggleState={toggleState} style={{width: '100%', border: '5px solid red', height: '100px'}}/> ); const blueBox = (toggleState, passedProps) => ( <BlueBox toggleState={toggleState} passedProps={passedProps} style={{width: '100%', border: '5px solid blue', height: '100px'}} /> ); const mouseTracker = (toggleState, passedProps) => ( <MouseTracker toggleState={toggleState} passedProps={passedProps} style={{top: '300px'}}/> ); const mouseInformer = (toggleState, passedProps) => ( <MouseInformer toggleState={toggleState} passedProps={passedProps} style={{top: '300px'}}/> ); return ( <div className="App"> <SlidePair renderLeft={redBox} renderRight={blueBox}/> <br/> <SlidePair renderLeft={mouseTracker} renderRight={mouseInformer} /> </div> ); } } export default App; 使用本地时区创建POSIXct值,而as.POSIXct创建UTC时间。如果以下联接中的时区不同,您将得到意想不到的结果。)

lubridate::ymd

这可以通过Fuzzyjoin完成:

df.state$starttime <- lubridate::ymd_hms(df.state$starttime)
df.state$endtime <- lubridate::ymd_hms(df.state$endtime)

答案 2 :(得分:1)

可以通过以下操作来完成:首先在您的df.sample data.frame中添加一个时间列,然后使用sapply根据您的条件进行评估,然后将此结果添加到df.sample

df.sample$time <- paste0(df.sample$date, ' ', sprintf('%02d', df.sample$hour),':', sprintf('%02d', df.sample$min), ':00')
df.sample$state <- sapply(df.sample$time, function(x) {
  after_start <- x >= df.state$starttime
  before_end <- x <= df.state$endtime
  y <- cbind(after_start, before_end)
  pass_check <- apply(y, 1, sum)
  if (2 %in% pass_check) {'PASS'} else {''}
  })

df.sample

   id       date hour min value                time state
1   A 2018-11-12    8  47    70 2018-11-12 08:47:00      
2   A 2018-11-12    8  59    70 2018-11-12 08:59:00  PASS
3   A 2018-11-12    9   6    86 2018-11-12 09:06:00  PASS
4   A 2018-11-12    9  18    86 2018-11-12 09:18:00  PASS
5   A 2018-11-12   13  22    86 2018-11-12 13:22:00  PASS
6   A 2018-11-12   13  36    74 2018-11-12 13:36:00  PASS
7   A 2018-11-12   16  12    81 2018-11-12 16:12:00      
8   A 2018-11-14    6  32    77 2018-11-14 06:32:00  PASS
9   A 2018-11-14    7  12    79 2018-11-14 07:12:00  PASS
10  A 2018-11-14   19  21    83 2018-11-14 19:21:00      
11  A 2018-11-12    7  47    91 2018-11-12 07:47:00 

答案 3 :(得分:1)

我所做的是从您提供的每个数据框中提取十进制小时,以便我询问是否在该十进制小时内找到了一个值。但是首先,您必须基于id(假设您还有其他id)和日期(假设每天只有一个州;或者换句话说df.state数据集中每天只有一个日期)合并数据集。 / p>

id <- c("A","A","A","A","A","A","A","A","A","A","A")
date <- c("2018-11-12","2018-11-12","2018-11-12","2018-11-12","2018-11-12",
          "2018-11-12","2018-11-12","2018-11-14","2018-11-14","2018-11-14",
          "2018-11-12")
hour <- c(8,8,9,9,13,13,16,6,7,19,7)
min <- c(47,59,6,18,22,36,12,32,12,21,47)
value <- c(70,70,86,86,86,74,81,77,79,83,91)
df.sample <- data.frame(id,date,hour,min,value,stringsAsFactors = F) 
df.sample$date <- as.Date(df.sample$date,format="%Y-%m-%d")

df.sample$dec.hour <- as.numeric(df.sample$hour) +
  as.numeric(df.sample$min)/60

我在上面添加的全部内容都是这些最后几行内容,它们可根据您提供的小时和分钟值计算出十进制小时

id <- c("A","A","A")
starttime <- c("2018-11-12 08:59:00","2018-11-14 06:24:17","2018-11-15 09:17:00")
endtime <- c("2018-11-12 15:57:00","2018-11-14 17:22:16","2018-11-15 12:17:32")
state <- c("Pass","Pass","Pass")

df.state <- data.frame(id,starttime,endtime,state,stringsAsFactors = F) 

在这里,我添加了一个日期向量(用于合并)。假设开始日期和结束时间始终相同,我可以任意选择开始时间。

df.state$date <- as.Date(df.state$starttime,format="%Y-%m-%d") 

然后我在该日期的开始时间和结束时间都得到了十进制小时

t.str <- strptime(df.state$starttime, "%Y-%m-%d %H:%M:%S")
df.state$dec.hour.start <- as.numeric(format(t.str, "%H")) +
  as.numeric(format(t.str, "%M"))/60

t.end <- strptime(df.state$endtime, "%Y-%m-%d %H:%M:%S")
df.state$dec.hour.end <- as.numeric(format(t.end, "%H")) +
  as.numeric(format(t.end, "%M"))/60

按ID和日期合并数据框

df<-merge(df.sample, df.state, by=c("id","date"))

如果样本的十进制小时在该日期的开始或结束十进制小时之内,则返回状态TRUE。

df<-df %>% 
  mutate(state = dec.hour >= dec.hour.start & dec.hour <= dec.hour.end) 

现在,如果您想摆脱我创建的所有这些额外的列(看起来像您想要的输出):

df<-df[,-c(6:8,10:11)]

因为df $ state是逻辑的,所以如果要更改TRUE以传递,而将FALSE更改为空格,则必须首先将值转换为字符:

df$state<-as.character(df$state)
df$state[df$state=="TRUE"]<-"pass"
df$state[df$state=="FALSE"]<-""

看看:

df

> df
   id       date hour min value state
1   A 2018-11-12    8  47    70      
2   A 2018-11-12    8  59    70  pass
3   A 2018-11-12    9   6    86  pass
4   A 2018-11-12    9  18    86  pass
5   A 2018-11-12   13  22    86  pass
6   A 2018-11-12   13  36    74  pass
7   A 2018-11-12   16  12    81      
8   A 2018-11-12    7  47    91      
9   A 2018-11-14    6  32    77  pass
10  A 2018-11-14    7  12    79  pass
11  A 2018-11-14   19  21    83      

我使用了这篇文章:extract hours and seconds from POSIXct for plotting purposes in R提取十进制小时数 还有一个:Check to see if a value is within a range?,以查看您的采样时间是否在您的状态时间内。