Question

嗨，我想为以下数据框分配一个ID。如果到达站与出发站匹配且到达时间与出发时间匹配，则公交车必须相同。有谁知道如何解决这个问题？预先感谢！

我想要以下内容：

Answer 1

一种方法是创建一个图形对象，其中组合键（工作站，时间戳记）表示图形顶点，而每条路径表示一条边。在此图中，每个连接的component代表一条唯一的路线，因此在您的示例中，您具有两个组成部分：

 Component 1: (Station1 10:10) -> (Station2 10:15) -> (Station3 10:18) -> (Station4 10:20)
 Component 2: (Station10 10:12) -> (Station 10:25)

使用igraph和Tidyverse软件包（此处为dplyr，magrittr和tibble），可以像这样实现这种方法：

# df is source data.  create a composite key for 
# arrival and departure by concatenating station 
# name and timestamp
df %<>% mutate(arrkey = paste0(From, Departure),
                 depkey = paste0(To, Arrival));

# create graph, identify clusters, and convert clusters to data frame
components <- graph_from_data_frame(df %>% select(arrkey, depkey)) %>%
   components() %>%
   `$`('membership') %>% 
   as.data.frame() %>%
   tibble::rownames_to_column() %T>%
   {names(.) <- c('vertexkey','component')}

# join components with original data frame to produce output
df %>% inner_join(components, by=c('arrkey'='vertexkey')) %>%
    select(ID=component, everything()) %>%
    select(-arrkey, -depkey) %>%
    arrange(ID, Departure)

这将产生所需的输出：

  ID      From        To Departure Arrival
1  1  Station1  Station2     10:10   10:15
2  1  Station2  Station3     10:15   10:18
3  1  Station3  Station4     10:18   10:20
4  2 Station10 Station15     10:12   10:25

注意：我使用以下代码生成df（为简单起见，从出发/到达日期中删除日期）：

df <- data.frame(
    From=c('Station1', 'Station10', 'Station2', 'Station3'),
    To=c('Station2', 'Station15', 'Station3', 'Station4'),
    Departure = c('10:10','10:12','10:15','10:18'),
    Arrival = c('10:15','10:25','10:18','10:20'));

通过匹配不同列上的条件来分配ID

1 个答案: