从此处改编的示例数据:https://www.reddit.com/r/rstats/comments/4j2efe/help_counting_unique_days_in_r_with_overlap_and/
// get the current date and time
let currentDateTime = NSDate()
// initialize the date formatter and set the style
let formatter = NSDateFormatter()
// October 26, 2015
formatter.timeStyle = NSDateFormatterStyle.NoStyle
formatter.dateStyle = NSDateFormatterStyle.LongStyle
formatter.stringFromDate(currentDateTime)
// 6:00:50 PM
formatter.timeStyle = NSDateFormatterStyle.MediumStyle
formatter.dateStyle = NSDateFormatterStyle.NoStyle
formatter.stringFromDate(currentDateTime)
创建间隔
df = read.table(text = "Start End
1/8/2015 1/9/2015
1/8/2015 1/9/2015
1/13/2015 1/15/2015
1/7/2015 1/17/2015
1/12/2015 1/22/2015
1/8/2015 1/16/2015" , header = T)
找到唯一的间隔。这个间隔发生了什么? 2015-01-12 UTC - 2015-01-22 UTC消失了。这是预期的行为吗?
df %>% transmute(Start = mdy(Start), End = mdy(End), Interval = interval(Start, End))
Start End Interval
1 2015-01-08 2015-01-09 2015-01-08 UTC--2015-01-09 UTC
2 2015-01-08 2015-01-09 2015-01-08 UTC--2015-01-09 UTC
3 2015-01-13 2015-01-15 2015-01-13 UTC--2015-01-15 UTC
4 2015-01-07 2015-01-17 2015-01-07 UTC--2015-01-17 UTC
5 2015-01-12 2015-01-22 2015-01-12 UTC--2015-01-22 UTC
6 2015-01-08 2015-01-16 2015-01-08 UTC--2015-01-16 UTC
答案 0 :(得分:2)
2015-01-12 UTC - 2015-01-22 UTC被删除,因为它是2015-01-07 UTC - 2015-01-17 UTC的重复案例,即使它们不是相同的对象,但它们是在==
运算符下彼此相等。
> intervalDf
Start End Interval
1 2015-01-08 2015-01-09 2015-01-08 UTC--2015-01-09 UTC
2 2015-01-08 2015-01-09 2015-01-08 UTC--2015-01-09 UTC
3 2015-01-13 2015-01-15 2015-01-13 UTC--2015-01-15 UTC
4 2015-01-07 2015-01-17 2015-01-07 UTC--2015-01-17 UTC
5 2015-01-12 2015-01-22 2015-01-12 UTC--2015-01-22 UTC
6 2015-01-08 2015-01-16 2015-01-08 UTC--2015-01-16 UTC
> intervalDf[4,3]
[1] 2015-01-07 UTC--2015-01-17 UTC
> intervalDf[5,3]
[1] 2015-01-12 UTC--2015-01-22 UTC
> intervalDf[4,3] == intervalDf[5,3]
[1] TRUE
然而
> identical(intervalDf[4,3], intervalDf[5,3])
[1] FALSE
这也可能意味着unique
使用==
作为比较函数。如果要保留它们,可以将Interval
列转换为字符,然后应用唯一函数。
<强>更新强>:
unique
函数在单列和多列数据框上的不一致性。
> dfTest
x Interval
1 1 2015-01-08 UTC--2015-01-09 UTC
2 1 2015-01-08 UTC--2015-01-09 UTC
3 1 2015-01-13 UTC--2015-01-15 UTC
4 1 2015-01-07 UTC--2015-01-17 UTC
5 1 2015-01-12 UTC--2015-01-22 UTC
6 1 2015-01-08 UTC--2015-01-16 UTC
> unique(dfTest)
x Interval
1 1 2015-01-08 UTC--2015-01-09 UTC
3 1 2015-01-13 UTC--2015-01-15 UTC
4 1 2015-01-07 UTC--2015-01-17 UTC
5 1 2015-01-12 UTC--2015-01-22 UTC
6 1 2015-01-08 UTC--2015-01-16 UTC
> dfTest1
Interval
1 2015-01-08 UTC--2015-01-09 UTC
2 2015-01-08 UTC--2015-01-09 UTC
3 2015-01-13 UTC--2015-01-15 UTC
4 2015-01-07 UTC--2015-01-17 UTC
5 2015-01-12 UTC--2015-01-22 UTC
6 2015-01-08 UTC--2015-01-16 UTC
> unique(dfTest1)
Interval
1 2015-01-08 UTC--2015-01-09 UTC
3 2015-01-13 UTC--2015-01-15 UTC
4 2015-01-07 UTC--2015-01-17 UTC
6 2015-01-08 UTC--2015-01-16 UTC
解释差异的两种方法定义。
> getAnywhere("unique.data.frame") A single object matching ‘unique.data.frame’ was found It was found in the following places package:base registered S3 method for unique from namespace base namespace:base with value
function (x, incomparables = FALSE, fromLast = FALSE, ...) {
if (!identical(incomparables, FALSE))
.NotYetUsed("incomparables != FALSE")
x[!duplicated(x, fromLast = fromLast, ...), , drop = FALSE] } <bytecode: 0x10c2ab0a0> <environment: namespace:base>
> getAnywhere("duplicated.data.frame") A single object matching ‘duplicated.data.frame’ was found It was found in the following places package:base registered S3 method for duplicated from namespace base namespace:base with value
function (x, incomparables = FALSE, fromLast = FALSE, ...) {
if (!identical(incomparables, FALSE))
.NotYetUsed("incomparables != FALSE")
if (length(x) != 1L)
duplicated(do.call("paste", c(x, sep = "\r")), fromLast = fromLast)
else duplicated(x[[1L]], fromLast = fromLast, ...) } <bytecode: 0x10c33a4b0> <environment: namespace:base>