Question

从此处改编的示例数据：https://www.reddit.com/r/rstats/comments/4j2efe/help_counting_unique_days_in_r_with_overlap_and/

// get the current date and time
let currentDateTime = NSDate()

// initialize the date formatter and set the style
let formatter = NSDateFormatter()

// October 26, 2015
formatter.timeStyle = NSDateFormatterStyle.NoStyle
formatter.dateStyle = NSDateFormatterStyle.LongStyle
formatter.stringFromDate(currentDateTime)

// 6:00:50 PM
formatter.timeStyle = NSDateFormatterStyle.MediumStyle
formatter.dateStyle = NSDateFormatterStyle.NoStyle
formatter.stringFromDate(currentDateTime)

创建间隔

df = read.table(text = "Start            End
           1/8/2015         1/9/2015
           1/8/2015         1/9/2015
           1/13/2015        1/15/2015
           1/7/2015         1/17/2015
           1/12/2015        1/22/2015
           1/8/2015         1/16/2015" , header = T)

找到唯一的间隔。这个间隔发生了什么？ 2015-01-12 UTC - 2015-01-22 UTC消失了。这是预期的行为吗？

df %>% transmute(Start = mdy(Start), End = mdy(End), Interval = interval(Start, End)) 

       Start        End                       Interval
1 2015-01-08 2015-01-09 2015-01-08 UTC--2015-01-09 UTC
2 2015-01-08 2015-01-09 2015-01-08 UTC--2015-01-09 UTC
3 2015-01-13 2015-01-15 2015-01-13 UTC--2015-01-15 UTC
4 2015-01-07 2015-01-17 2015-01-07 UTC--2015-01-17 UTC
5 2015-01-12 2015-01-22 2015-01-12 UTC--2015-01-22 UTC
6 2015-01-08 2015-01-16 2015-01-08 UTC--2015-01-16 UTC

Answer 1

2015-01-12 UTC - 2015-01-22 UTC被删除，因为它是2015-01-07 UTC - 2015-01-17 UTC的重复案例，即使它们不是相同的对象，但它们是在==运算符下彼此相等。

> intervalDf
       Start        End                       Interval
1 2015-01-08 2015-01-09 2015-01-08 UTC--2015-01-09 UTC
2 2015-01-08 2015-01-09 2015-01-08 UTC--2015-01-09 UTC
3 2015-01-13 2015-01-15 2015-01-13 UTC--2015-01-15 UTC
4 2015-01-07 2015-01-17 2015-01-07 UTC--2015-01-17 UTC
5 2015-01-12 2015-01-22 2015-01-12 UTC--2015-01-22 UTC
6 2015-01-08 2015-01-16 2015-01-08 UTC--2015-01-16 UTC
> intervalDf[4,3]
[1] 2015-01-07 UTC--2015-01-17 UTC

> intervalDf[5,3]
[1] 2015-01-12 UTC--2015-01-22 UTC
> intervalDf[4,3] == intervalDf[5,3]
[1] TRUE

然而

> identical(intervalDf[4,3], intervalDf[5,3])
[1] FALSE

这也可能意味着unique使用==作为比较函数。如果要保留它们，可以将Interval列转换为字符，然后应用唯一函数。

<强>更新： unique函数在单列和多列数据框上的不一致性。

> dfTest
  x                       Interval
1 1 2015-01-08 UTC--2015-01-09 UTC
2 1 2015-01-08 UTC--2015-01-09 UTC
3 1 2015-01-13 UTC--2015-01-15 UTC
4 1 2015-01-07 UTC--2015-01-17 UTC
5 1 2015-01-12 UTC--2015-01-22 UTC
6 1 2015-01-08 UTC--2015-01-16 UTC
> unique(dfTest)
  x                       Interval
1 1 2015-01-08 UTC--2015-01-09 UTC
3 1 2015-01-13 UTC--2015-01-15 UTC
4 1 2015-01-07 UTC--2015-01-17 UTC
5 1 2015-01-12 UTC--2015-01-22 UTC
6 1 2015-01-08 UTC--2015-01-16 UTC
> dfTest1
                        Interval
1 2015-01-08 UTC--2015-01-09 UTC
2 2015-01-08 UTC--2015-01-09 UTC
3 2015-01-13 UTC--2015-01-15 UTC
4 2015-01-07 UTC--2015-01-17 UTC
5 2015-01-12 UTC--2015-01-22 UTC
6 2015-01-08 UTC--2015-01-16 UTC
> unique(dfTest1)
                        Interval
1 2015-01-08 UTC--2015-01-09 UTC
3 2015-01-13 UTC--2015-01-15 UTC
4 2015-01-07 UTC--2015-01-17 UTC
6 2015-01-08 UTC--2015-01-16 UTC

解释差异的两种方法定义。

> getAnywhere("unique.data.frame") A single object matching ‘unique.data.frame’ was found It was found in the following places   package:base   registered S3 method for unique from namespace base   namespace:base with value

function (x, incomparables = FALSE, fromLast = FALSE, ...)  {
    if (!identical(incomparables, FALSE)) 
        .NotYetUsed("incomparables != FALSE")
    x[!duplicated(x, fromLast = fromLast, ...), , drop = FALSE] } <bytecode: 0x10c2ab0a0> <environment: namespace:base>
> getAnywhere("duplicated.data.frame") A single object matching ‘duplicated.data.frame’ was found It was found in the following places package:base   registered S3 method for duplicated from namespace base namespace:base with value

function (x, incomparables = FALSE, fromLast = FALSE, ...)  {
    if (!identical(incomparables, FALSE)) 
        .NotYetUsed("incomparables != FALSE")
    if (length(x) != 1L) 
        duplicated(do.call("paste", c(x, sep = "\r")), fromLast = fromLast)
    else duplicated(x[[1L]], fromLast = fromLast, ...) } <bytecode: 0x10c33a4b0> <environment: namespace:base>

这里发生了什么在lubridate :: unique.Interval？

1 个答案: