对于海量数据转储感到抱歉,但我无法在我尝试过的数据子集上重现这一点。将数据的dput
复制粘贴(165个,而不是疯狂)到this Gist。
我试图在DT
中按sport
绘制数据,根据:
gini
绘制为散点图,颜色变化sport
five_year_ma
绘制为一条线,颜色与2中的颜色相匹配。这应该很简单,我以前做过类似的事情。这是应该工作的:
#empty plot with proper axes
DT[ , plot(
NA, ylim = range(gini), xlim = range(season),
xlab = "Season", ylab = "Gini",
main = "Comparison of Gini Coefficient Across Sports")]
#pick colors for each sport
cols <- c(NHL="black", NBA="red")
DT[ , {
#add points to current plot
points(season, gini, col = cols[.BY$sport])
#add lines to current plot
lines(season, five_yr_ma, col = cols[.BY$sport], lwd = 3)},
by = sport]
但是这给了我输出/错误:
# Empty data.table (0 rows) of 1 col: sport
错误:
而异x
和y
长度因plot.xy()
这很奇怪。如果我们跳过分组并且只是手动完成,那么它可以完美地运行:
all_sports[sport == "NBA", {
points(season, gini, col = "red")
lines(season, five_yr_ma, col = "red", lwd = 3)}]
all_sports[sport == "NHL", {
points(season, gini, col = "black")
lines(season, five_yr_ma, col = "black", lwd = 3)}]
此外,即使在分组的情况下,也不清楚为什么plot.xy
已经接收到不同长度的参数 - 如果我们进行以下调整以强制R在它们被发送之前记录输入,那么似乎有任何问题:
all_sports[ , {
cat("\n\nPlotting for sport: ", .BY$sport)
points(x1 <- season, y1 <- gini, col = cols[.BY$sport])
lines(x2 <- season, y2 <- five_yr_ma, col = cols[.BY$sport], lwd = 3)
cat("\npoints/season: ",length(x1),
"\npoints/gini: ", length(y1),
"\nlines/season: ", length(x2),
"\nlines/five_yr_ma: ", length(y2))},
by = sport]
有输出:
# Plotting for sport: NHL
# points/season: 98
# points/gini: 98
# lines/season: 98
# lines/five_yr_ma: 98
# Plotting for sport: NBA
# points/season: 67
# points/gini: 67
# lines/season: 67
# lines/five_yr_ma: 67
可能会发生什么?
由于看起来这种情况在各种机器上都不常见,因此这是我的sessionInfo()
:
R version 3.2.4 (2016-03-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.3 LTS
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.9.7
loaded via a namespace (and not attached):
[1] rsconnect_0.4.1.11 tools_3.2.4
答案 0 :(得分:2)
事实上,正如@Arun指出的那样,这似乎是(尚未解决的)问题的重新铺设导致了这个问题的错误:
Values of the wrong group are used when using plot() within a data.table() in RStudio
正如@Arun在那里发现的那样,似乎RStudio的原生图形设备因为在j
存在时评估by
时创建的不同子组所使用的更改指针而被绊倒,这有助于每次只需copy
所有.SD
的解决方法,例如:
points(copy(season), copy(gini),
col = cols[.BY$sport])
lines(copy(season), copy(five_yr_ma),
col = cols[.BY$sport], lwd = 3)
或者
x <- copy(.SD)
with(x, {points(season, gini, cols = cols[.BY$sport]);
lines(copy(season), copy(five_yr_ma),
col = cols[.BY$sport], lwd = 3)})
这两个对我有用(因为子组太小,这里没有计算效率问题 - 我们可以copy
离开而不会明显影响性能。)
这是data.table
GitHub页面上的#1524,我在RStudio支持处提交了this错误报告;如果推送修复,则会更新此内容。