我在R中有一个data.table,我想创建一个新列,找到每个月/月的每个价格的间隔。
可重复的示例:
set.seed(100)
DT <- data.table(year=2000:2009, month=1:10, price=runif(5*26^2)*100)
intervals <- list(year=2000:2009, month=1:10, interval = sort(round(runif(9)*100)))
intervals <- replicate(10, (sample(10:100,100, replace=T)))
intervals <- t(apply(intervals, 1, sort))
intervals.dt <- data.table(intervals)
intervals.dt[, c("year", "month") := list(rep(2000:2009, each=10), 1:10)]
setkey(intervals.dt, year, month)
setkey(DT, year, month)
我刚试过:
DT
和intervals.dt
data.tables,intervalsstring
列,其中包含所有V *列
一列字符串,(不是很优雅,我承认),最后findInterval()
中使用它,但解决方案不适用于每一行(!)所以,之后:
DT <- merge(DT, intervals.dt)
DT <- DT[, intervalsstring := paste(V1, V2, V3, V4, V5, V6, V7, V8, V9, V10)]
DT <- DT[, c("V1", "V2", "V3", "V4", "V5", "V6", "V7", "V8", "V9", "V10") := NULL]
DT[, interval := findInterval(price, strsplit(intervalsstring, " ")[[1]])]
我得到了
> DT
year month price intervalsstring interval
1: 2000 1 30.776611 12 21 36 46 48 51 63 72 91 95 2
2: 2000 1 62.499648 12 21 36 46 48 51 63 72 91 95 6
3: 2000 1 53.581115 12 21 36 46 48 51 63 72 91 95 6
4: 2000 1 48.830599 12 21 36 46 48 51 63 72 91 95 5
5: 2000 1 33.066053 12 21 36 46 48 51 63 72 91 95 2
---
3376: 2009 10 33.635924 12 40 45 48 50 65 75 90 96 97 2
3377: 2009 10 38.993769 12 40 45 48 50 65 75 90 96 97 3
3378: 2009 10 75.065820 12 40 45 48 50 65 75 90 96 97 8
3379: 2009 10 6.277403 12 40 45 48 50 65 75 90 96 97 0
3380: 2009 10 64.189162 12 40 45 48 50 65 75 90 96 97 7
对于第一行是正确的,但对于最后一行(或其他)行不是。
例如,对于行3380,价格~64.19应该在第5个间隔而不是第7个。我想我的错误是,通过我的上一个命令,找到Intervals只依赖intervalsstring
的第一行。
谢谢!
答案 0 :(得分:3)
您必须使用参数by = year
将该函数应用于所有子集:
DT[, interval := findInterval(price, intervals[as.character(year), ]), by = year]
year price interval
1: 2000 30.776611 4
2: 2001 25.767250 1
3: 2002 55.232243 4
4: 2003 5.638315 0
5: 2004 46.854928 2
---
3376: 2005 97.497761 10
3377: 2006 50.141227 5
3378: 2007 50.186270 7
3379: 2008 99.229338 10
3380: 2009 64.189162 8
更新(基于已编辑的问题):
DT[ , interval := findInterval(price,
unlist(intervals.dt[J(year[1], month[1]),
1:10, with = FALSE])),
by = c("year", "month")]
year month price V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 interval
1: 2000 1 30.776611 12 21 36 46 48 51 63 72 91 95 2
2: 2000 1 62.499648 12 21 36 46 48 51 63 72 91 95 6
3: 2000 1 53.581115 12 21 36 46 48 51 63 72 91 95 6
4: 2000 1 48.830599 12 21 36 46 48 51 63 72 91 95 5
5: 2000 1 33.066053 12 21 36 46 48 51 63 72 91 95 2
---
3376: 2009 10 33.635924 12 40 45 48 50 65 75 90 96 97 2
3377: 2009 10 38.993769 12 40 45 48 50 65 75 90 96 97 3
3378: 2009 10 75.065820 12 40 45 48 50 65 75 90 96 97 8
3379: 2009 10 6.277403 12 40 45 48 50 65 75 90 96 97 0
3380: 2009 10 64.189162 12 40 45 48 50 65 75 90 96 97 7