通过键入其名称来查看data.table会产生错误,同时我们可以使用str和glimpse进行查看

时间:2017-11-09 08:15:14

标签: r data.table

这让我难以理解最近的愚蠢错误。 我正在处理名为log08t的data.table,当我通过在命令行输入其名称来查看它时,它会出现此错误:

log08t

  Error in dimnames(x) <- dn :    length of 'dimnames' [1] not equal to array extent In addition: Warning message: In cbind(time =
 c("2017-11-08 12:38:09", "2017-11-08 12:38:09",  :   number of rows of
 result is not a multiple of vector length (arg 1)

当我通过str查看其结构时,它看起来像这样。

str(log08t)

 Classes ‘data.table’ and 'data.frame': 5389 obs. of  19 variables:
 $ time        : POSIXct, format: "2017-11-08 12:38:09" "2017-11-08 12:38:09" "2017-11-08 12:38:09" "2017-11-08 12:38:09" ...
 $ type        : chr  "API-XML" "API-XML" "MySQL" "MySQL" ...
 $ id          : num  40192 40193 4131 4131 4131 ...
 $ gap         :Class 'difftime'  atomic [1:5389] 2.59e+01 0.00 2.71e-01 2.12e-02 3.05e-04 ...
  .. ..- attr(*, "units")= chr "secs"
 $ bunch2      : num  24 24 24 24 24 24 24 24 24 24 ...
 $ service_name: chr  "GetMyTodaysSessions" "GetMyCurrentSession" "SELECT" "SELECT" ...
 $ table       : chr  NA NA NA "class_sessions" ...
 $ user_id     : chr  NA NA NA NA ...
 $ code        : chr  NA NA NA NA ...
 $ from        : chr  NA NA NA NA ...
 $ to          : chr  NA NA NA NA ...
 $ input_string: chr  "Service : GetMyTodaysSessions; UserId : 499" "Service : GetMyCurrentSession; UserId : 499" NA NA ...
 $ contents    : chr  "5299; 2017-11-08 07:57:41; 2017-11-08 08:27:41; 6; Sanjay; 499; 17; 6th grade Physics section A; 12; Room 12A; "| __truncated__ NA "select current_timestamp" "select term.class_session_id from class_sessions as term inn..." ...
 $ break_cat   : chr  "block13" "block13" "block14" "block14" ...
 $ break_serv  : chr  "batch1" "batch2" "batch1" "batch1" ...
 $ shftime     : POSIXct, format: "2017-11-08 12:37:43" "2017-11-08 12:38:09" "2017-11-08 12:38:09" "2017-11-08 12:38:09" ...
 $ bunch       : int  23 24 24 24 24 24 24 24 24 24 ...
 $ datetext    : chr  "2017-11-08 12:38:09" "2017-11-08 12:38:09" "2017-11-08 12:38:09" "2017-11-08 12:38:09" ...
 $ timesec     :Formal class 'Period' [package "lubridate"] with 6 slots
  .. ..@ .Data : num  1.51e+09 1.51e+09 1.51e+09 1.51e+09 1.51e+09 ...
  .. ..@ year  : num  0 0 0 0 0 0 0 0 0 0 ...
  .. ..@ month : num  0 0 0 0 0 0 0 0 0 0 ...
  .. ..@ day   : num  0 0 0 0 0 0 0 0 0 0 ...
  .. ..@ hour  : num  0 0 0 0 0 0 0 0 0 0 ...
  .. ..@ minute: num  0 0 0 0 0 0 0 0 0 0 ...
 - attr(*, "sorted")= chr "time"
 - attr(*, ".internal.selfref")=<externalptr> enter code here

我可以看到它的尺寸:

dim(log08t)
# [1] 5389   19

我可以计算其行数,并查看列名称。

> nrow(log08t)
# [1] 5389
> NROW(log08t$time)
# [1] 5389
> NROW(log08t$timesec)
# [1] 5389
> names(log08t)
# [1] "time"         "type"         "id"           "gap"          "bunch2"       "service_name" "table"        "user_id"      "code"         "from"         "to"           "input_string"
# [13] "contents"     "break_cat"    "break_serv"   "shftime"      "bunch"        "datetext"     "timesec" 

任何尝试完整地查看它或子集(所有列,几行)都会引发错误

但是列的一部分可以工作,

log08t[,.(type,time)][1:10]

type                time
 1: API-XML 2017-11-08 12:38:09
 2: API-XML 2017-11-08 12:38:09
 3:   MySQL 2017-11-08 12:38:09
 4:   MySQL 2017-11-08 12:38:09
 5:   MySQL 2017-11-08 12:38:09
 6:   MySQL 2017-11-08 12:38:09
 7:   MySQL 2017-11-08 12:38:09
 8:   MySQL 2017-11-08 12:38:09
 9:   MySQL 2017-11-08 12:38:09
10:   MySQL 2017-11-08 12:38:09

我确信,罪魁祸首是最后一列timesec:我添加此列后,错误开始了。见这里,

log08t[,.(type,time,timesec)]
# Error in dimnames(x) <- dn : 
#  length of 'dimnames' [1] not equal to array extent
# In addition: Warning message:
# In cbind(type = c("API-XML", "API-XML", "MySQL", "MySQL", "MySQL",  :
#  number of rows of result is not a multiple of vector length (arg 1)

当我删除列时,这是正常的,

> log08t[,timesec:=NULL]

> log08t
                     time    type    id               gap bunch2        service_name             table user_id code from to                                input_string
   1: 2017-11-08 12:38:09 API-XML 40192 2.586546e+01 secs     24 GetMyTodaysSessions                NA      NA   NA   NA NA Service : GetMyTodaysSessions; UserId : 499
   2: 2017-11-08 12:38:09 API-XML 40193 0.000000e+00 secs     24 GetMyCurrentSession                NA      NA   NA   NA NA Service : GetMyCurrentSession; UserId : 499
   3: 2017-11-08 12:38:09   MySQL  4131 2.713320e-01 secs     24              SELECT                NA      NA   NA   NA NA                                          NA
   4: 2017-11-08 12:38:09   MySQL  4131 2.119088e-02 secs     24              SELECT    class_sessions      NA   NA   NA NA                                          NA
   5: 2017-11-08 12:38:09   MySQL  4131 3.051758e-04 secs     24              SELECT student_class_map      NA   NA   NA NA                                          NA
  ---                                                                                                                                                                  
5385: 2017-11-08 13:14:25   MySQL  4355 1.583099e-03 secs    129              SELECT          tbl_auth      NA   NA   NA NA                                          NA
5386: 2017-11-08 13:14:25   MySQL  4355 3.561974e-04 secs    129              SELECT           schools      NA   NA   NA NA                                          NA
5387: 2017-11-08 13:14:25   MySQL  4355 4.777908e-04 secs    129              SELECT             seats      NA   NA   NA NA                                          NA
5388: 2017-11-08 13:14:25   MySQL  4355 3.828907e-02 secs    129              SELECT student_class_map      NA   NA   NA NA                                          NA
5389: 2017-11-08 13:14:25   MySQL  4355 4.160404e-04 secs    129              SELECT                NA      NA   NA   NA NA        

                              NA

我想知道最后一列有什么问题,即使出现问题,为什么data.table不能用NA或NULL替换值并继续前进?

0 个答案:

没有答案