Question

经过大量的思考和谷歌搜索后，我无法找到问题的解决方案，希望你能帮助我。

我有一个大型数据框，其ID列可以重复2次以上，一个起始日期和结束日期列将构成一个时间段。我想找出，按ID分组，如果该ID的任何时间段与另一个ID重叠，如果是，则通过创建新列来标记它，例如，说明该ID是否有重叠。

以下是已包含所需新列的示例数据框：

structure(list(ID= c(34L, 34L, 80L, 80L, 81L, 81L, 81L, 94L, 
94L), Start = structure(c(1072911600, 1262300400, 1157061600, 
1277935200, 1157061600, 1277935200, 1157061600, 1075590000, 1285891200
), class = c("POSIXct", "POSIXt"), tzone = ""), End = structure(c(1262214000, 
1409436000, 1251669600, 1404079200, 1251669600, 1404079200, 1251669600, 
1264892400, 1475193600), class = c("POSIXct", "POSIXt"), tzone = ""), 
    Overlap = c(FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, 
    FALSE, FALSE)), .Names = c("ID", "Start", "End", "Overlap"
), row.names = c(NA, -9L), class = "data.frame")


 ID               Start                 End Overlap
 34 2004-01-01 00:00:00 2009-12-31 00:00:00   FALSE
 34 2010-01-01 00:00:00 2014-08-31 00:00:00   FALSE
 80 2006-09-01 00:00:00 2009-08-31 00:00:00   FALSE
 80 2010-07-01 00:00:00 2014-06-30 00:00:00   FALSE
 81 2006-09-01 00:00:00 2009-08-31 00:00:00    TRUE
 81 2010-07-01 00:00:00 2014-06-30 00:00:00    TRUE
 81 2006-09-01 00:00:00 2009-08-31 00:00:00    TRUE
 94 2004-02-01 00:00:00 2010-01-31 00:00:00   FALSE
 94 2010-10-01 02:00:00 2016-09-30 02:00:00   FALSE

在这种情况下，对于ID“81”，两个时间段之间存在重叠，因此我想将ID = 81的所有行标记为TRUE，这意味着找到该ID的至少两行中的重叠。这只是一个理想的解决方案，但总的来说，我想做的就是在按ID分组时找出重叠，因此标记它的方式可以灵活，以防它简化。

提前感谢您的帮助。

Answer 1

另一个选项 - 假设df包含您的数据框，则：

library(data.table)
dt <- data.table(df, key=c("Start", "End"))[, `:=`(Overlap=NULL, row=1:nrow(df))]
overlapping <- unique(foverlaps(dt, dt)[ID==i.ID & row!=i.row, ID])
dt[, `:=`(Overlap=FALSE, row=NULL)][ID %in% overlapping, Overlap:=TRUE][order(ID, Start)]
#    ID               Start                 End Overlap
# 1: 34 2004-01-01 00:00:00 2009-12-31 00:00:00   FALSE
# 2: 34 2010-01-01 00:00:00 2014-08-31 00:00:00   FALSE
# 3: 80 2006-09-01 00:00:00 2009-08-31 00:00:00   FALSE
# 4: 80 2010-07-01 00:00:00 2014-06-30 00:00:00   FALSE
# 5: 81 2006-09-01 00:00:00 2009-08-31 00:00:00    TRUE
# 6: 81 2006-09-01 00:00:00 2009-08-31 00:00:00    TRUE
# 7: 81 2010-07-01 00:00:00 2014-06-30 00:00:00    TRUE
# 8: 94 2004-02-01 00:00:00 2010-01-31 00:00:00   FALSE
# 9: 94 2010-10-01 02:00:00 2016-09-30 02:00:00   FALSE

Answer 2

我认为这是您正在寻找的代码？让我知道。

- (NSArray*)getDisplayedAttributes
{
    //Get stop attributes
    NSMutableArray *attributes = [[NSMutableArray alloc] init];
    for (Attribute *attr in self.attributes)
    {
        // Skip special attribute
        BOOL found = false;
        for (Attribute *sa in @[@"D:AR",@"D:AS",@"D:ARF",@"D:DD",@"D:DH"])
        {
            if ([(NSString*)sa isEqualToString:attr.name])
            {
                found = true;
                break;
            }
        }

        if (found) continue;

        Attribute *attribute = [[Attribute alloc] init];
        attribute.name = attr.name;
        attribute.value = attr.value;        
        [attributes addObject:attribute];
    }

    return attributes;
}

另外，我希望有人能为我的data<- structure(list(ID= c(34L, 34L, 80L, 80L, 81L, 81L, 81L, 94L, 94L), Start = structure(c(1072911600, 1262300400, 1157061600, 1277935200, 1157061600, 1277935200, 1157061600, 1075590000, 1285891200 ), class = c("POSIXct", "POSIXt"), tzone = ""), End = structure(c(1262214000, 1409436000, 1251669600, 1404079200, 1251669600, 1404079200, 1251669600, 1264892400, 1475193600), class = c("POSIXct", "POSIXt"), tzone = ""), Overlap = c(FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, FALSE, FALSE)), .Names = c("ID", "Start", "End", "Overlap" ), row.names = c(NA, -9L), class = "data.frame") library("dplyr") library("lubridate") overlaps<- function(intervals){ for(i in 1:(length(intervals)-1)){ for(j in (i+1):length(intervals)){ if(int_overlaps(intervals[i],intervals[j])){ return(TRUE) } } } return(FALSE) } data %>% mutate(Interval=interval(Start,End))%>% group_by(ID) %>% do({ df<-. ovl<- overlaps(df$Interval) return(data.frame(ID=df$ID[1], ovl)) })函数提供更优雅的解决方案..

R查找时间段之间的重叠

2 个答案: