使用时间戳

时间:2017-12-11 21:29:53

标签: r join data.table overlap

我正在努力尝试合并时间戳和另一个变量,特别是看一下链中有多少个ambulette拾音器,这个想法是这些是同一个ambulette链中的多个拾音器被认为是相同的整体行程而那些唯一的旅行是独立的。

我已经使用了sqldf,但是foverlaps似乎更快,更具可伸缩性,所以我想使用这个包或类似的包。我能够将两者合并以查找链接时间内有多少项目开始时间,但我没有得到y中非匹配项的回报。

以下是重现的代码,在这种情况下只有一个由ID标记的ambulette。我使用此链接作为参考: Data Table merge based on date ranges

#example---
trips = data.table(
  "ambulette_id" = "1"
  ,"pickup" = as.POSIXct(c("2017-08-01 04:30:54", 
                        "2017-08-01 04:50:54", "2017-08-01 05:25:54", "2017-08-01 05:35:54", 
                        "2017-08-01 07:45:54", "2017-08-01 08:15:54", "2017-08-01 09:15:54", 
                        "2017-08-01 09:15:54", "2017-08-01 10:00:54", "2017-08-01 11:40:54", 
                        "2017-08-01 12:00:54", "2017-08-01 12:40:54"), tz = "GMT")
  ,"dropoff" = as.POSIXct(c("2017-08-01 05:00:59", 
                         "2017-08-01 05:00:59", "2017-08-01 05:55:59", "2017-08-01 05:55:59", 
                         "2017-08-01 08:35:59", "2017-08-01 08:35:59", "2017-08-01 09:30:59", 
                         "2017-08-01 09:45:59", "2017-08-01 10:30:59", "2017-08-01 11:50:59", 
                         "2017-08-01 12:15:59", "2017-08-01 13:05:59"), tz = "GMT")
  )[,pickup2:=pickup]


chains = data.table(
  "ambulette_id" = "1"
  ,"ambulette_chain_start" = as.POSIXct(c("2017-08-01 04:30:54", 
                            "2017-08-01 05:25:54", "2017-08-01 07:45:54", "2017-08-01 09:15:54"
  ), tz = "GMT")
  ,"ambulette_chain_end" = as.POSIXct(c("2017-08-01 05:00:59", "2017-08-01 05:55:59", 
                             "2017-08-01 08:35:59", "2017-08-01 09:45:59"),tz = "GMT")
  )



 #The final result is to merge trips on chains to get the pickups in trips that start in between the ranges in chains. Any pickups that don't match, should still show up as lone pickups, but instead foverlaps dumps them. Is there anyway to keep them?


setkey(trips,ambulette_id,pickup, pickup2)
final_join = foverlaps(chains
                       ,trips
                       ,by.x = c("ambulette_id", "ambulette_chain_start", "ambulette_chain_end"))[
                         ,pickup2:=NULL]

#test shows some trips not showing up in the final join
trips[!(pickup %in% final_join$pickup)]

以下是获取我想要的结果的SQL版本:

#sqldf version
library(sqldf)
z = setDT(sqldf("SELECT 
               trips.ambulette_id, 
               trips.pickup,
               trips.dropoff,
               chains.ambulette_chain_start,
               chains.ambulette_chain_end
               FROM trips LEFT JOIN chains  
               ON trips.ambulette_id = chains.ambulette_id AND 
               pickup BETWEEN ambulette_chain_start AND ambulette_chain_end"))[
                 ,chained:=ifelse(is.na(ambulette_chain_start), "no", "yes")]

z

更新:

第一个响应似乎回答了几乎所有的问题,但我想保持ambulette链的开始和结束列合并,以便最终产品如下所示。我该怎么做?

ambulette_id              pickup             dropoff ambulette_chain_start ambulette_chain_end chained
 1:            1 2017-08-01 00:30:54 2017-08-01 01:00:59   2017-08-01 00:30:54 2017-08-01 01:00:59     yes
 2:            1 2017-08-01 00:50:54 2017-08-01 01:00:59   2017-08-01 00:30:54 2017-08-01 01:00:59     yes
 3:            1 2017-08-01 01:25:54 2017-08-01 01:55:59   2017-08-01 01:25:54 2017-08-01 01:55:59     yes
 4:            1 2017-08-01 01:35:54 2017-08-01 01:55:59   2017-08-01 01:25:54 2017-08-01 01:55:59     yes
 5:            1 2017-08-01 03:45:54 2017-08-01 04:35:59   2017-08-01 03:45:54 2017-08-01 04:35:59     yes
 6:            1 2017-08-01 04:15:54 2017-08-01 04:35:59   2017-08-01 03:45:54 2017-08-01 04:35:59     yes
 7:            1 2017-08-01 05:15:54 2017-08-01 05:30:59   2017-08-01 05:15:54 2017-08-01 05:45:59     yes
 8:            1 2017-08-01 05:15:54 2017-08-01 05:45:59   2017-08-01 05:15:54 2017-08-01 05:45:59     yes
 9:            1 2017-08-01 06:00:54 2017-08-01 06:30:59                  <NA>                <NA>      no
10:            1 2017-08-01 07:40:54 2017-08-01 07:50:59                  <NA>                <NA>      no
11:            1 2017-08-01 08:00:54 2017-08-01 08:15:59                  <NA>                <NA>      no
12:            1 2017-08-01 08:40:54 2017-08-01 09:05:59                  <NA>                <NA>      no

进一步更新: 按照建议实施: 1.省略了没有任务 2.添加了链分配

trips[
  chains, on = .(ambulette_id, pickup > ambulette_chain_start,
                 pickup < ambulette_chain_end)
  ,':='(chained = 'yes'
        , ambulette_chain_start = ambulette_chain_start
        ,ambulette_chain_end = ambulette_chain_end)]

ambulette_id              pickup             dropoff             pickup2 chained ambulette_chain_start
 1:            1 2017-08-01 04:30:54 2017-08-01 05:00:59 2017-08-01 04:30:54      NA                  <NA>
 2:            1 2017-08-01 04:50:54 2017-08-01 05:00:59 2017-08-01 04:50:54     yes   2017-08-01 04:30:54
 3:            1 2017-08-01 05:25:54 2017-08-01 05:55:59 2017-08-01 05:25:54      NA                  <NA>
 4:            1 2017-08-01 05:35:54 2017-08-01 05:55:59 2017-08-01 05:35:54     yes   2017-08-01 05:25:54
 5:            1 2017-08-01 07:45:54 2017-08-01 08:35:59 2017-08-01 07:45:54      NA                  <NA>
 6:            1 2017-08-01 08:15:54 2017-08-01 08:35:59 2017-08-01 08:15:54     yes   2017-08-01 07:45:54
 7:            1 2017-08-01 09:15:54 2017-08-01 09:30:59 2017-08-01 09:15:54      NA                  <NA>
 8:            1 2017-08-01 09:15:54 2017-08-01 09:45:59 2017-08-01 09:15:54      NA                  <NA>
 9:            1 2017-08-01 10:00:54 2017-08-01 10:30:59 2017-08-01 10:00:54      NA                  <NA>
10:            1 2017-08-01 11:40:54 2017-08-01 11:50:59 2017-08-01 11:40:54      NA                  <NA>
11:            1 2017-08-01 12:00:54 2017-08-01 12:15:59 2017-08-01 12:00:54      NA                  <NA>
12:            1 2017-08-01 12:40:54 2017-08-01 13:05:59 2017-08-01 12:40:54      NA                  <NA>
    ambulette_chain_end
 1:                <NA>
 2: 2017-08-01 05:00:59
 3:                <NA>
 4: 2017-08-01 05:55:59
 5:                <NA>
 6: 2017-08-01 08:35:59
 7:                <NA>
 8:                <NA>
 9:                <NA>
10:                <NA>
11:                <NA>
12:                <NA>

我实施不正确我认为,因为结果与我使用sql或以前的解决方案不同

1 个答案:

答案 0 :(得分:2)

你走了:

interface CardData {
  id: string;
  title: string;
  description: string;
  status: string;
  color: string;
}

interface IDashboardState {
  cards: CardData[];
}

export class Dashboard extends React.Component<RouteComponentProps<{}>, IDashboardState> {
  constructor() {
    super();
    this.state = { cards: [] };
  }

  public componentDidMount() {

    const baseUrl = 'http://localhost:51429/Home/Cards';
    var cardEntities: CardEntity[];

    fetch(baseUrl)
      .then((response) => (response.json())
        .then((responseData) => {
          console.log(responseData.length);
          this.setState({ cards: responseData as CardData[] })
        })
        .catch((error) => {
          console.log("Error loading data", error);
        }));

  }

  render() {
    return (
      <div className="app">
        <List Id='todo' Title="To do" Cards={
          this.state.cards.filter((card) => card.status === "todo")}
        />
        <List Id='in-progress' Title="In progress" Cards={
          this.state.cards.filter((card) => card.status === "in-progress")}
        />
        <List Id='done' Title="Done" Cards={
          this.state.cards.filter((card) => card.status === "done")}
        />
      </div>
    );
  }
}