Return by reference | data.table - how to avoid using copy() at function return

时间:2018-02-03 10:15:40

标签: r data.table

I'm fairly new to data.table coming from using dplyr(). In the function below, the DT object is a reference of the tryCatch statement, and when returned does not behave as I would like it to.

From reading Understanding exactly when a data.table is a reference to (vs a copy of) another data.table, omitting copy at the return statement will return a reference to the tryCatch statement, which in turn (if successful) returns the manipulated/mutated data.table object.

Now, using copy at the end of the function is an unnecessary overhead - how do I return the tryCatch object without calling copy? When not using copy the function returns an object with a reference to the tryCatch (as I understand it) which is not what I want.

Code

Load_Data <- function(path) {

  col.names = c('Ticker', 'Date', 'Time',
                'Open', 'High', 'Low', 'Close', 'Volume')

  DT <- tryCatch({

    DT.try = data.table::fread(path)

    # reformat Date column
    setnames(DT.try, col.names)
    DT.try[, `:=` (Date = as.Date(as.character(Date), format = '%Y%m%d'))]

    }, warning = function(w) {

      print(w); cat('Warning on reading file: ', path)
      # return despite warning
      return(DT.try)

    }, error = function(e) {

      print(e); cat('Error on reading file: ', path)
      return(NA)
    }
  )
  return(copy(DT)) # how do I avoid using copy()? 
  }

# when not returning with copy(DT), then this happens (console output)
> a <- Load_Data('data.example.csv')
> a # a is copied + loaded into memory? 
> a # a is NOW printed

      Ticker       Date  Time     Open     High       Low     Close Volume
   1:    AAK 2005-09-29 00:00 100.0189 100.7159  98.62490  98.62490  17791
   2:    AAK 2005-09-30 00:00  98.9734  99.6704  98.27640  99.67040  35438
   3:    AAK 2005-10-03 00:00  99.3219 100.3674  97.57941  97.57941   6600
   4:    AAK 2005-10-04 00:00  98.2764  98.2764  97.92791  98.27640  31564
   5:    AAK 2005-10-05 00:00  98.2764  99.3219  98.27640  99.32190   3730

data.example.csv | original data being read into Load_data()

  <TICKER> <DTYYYYMMDD> <TIME>   <OPEN>   <HIGH>    <LOW>  <CLOSE> <VOL>
1:      AAK     20050929  00:00 100.0189 100.7159 98.62490 98.62490 17791
2:      AAK     20050930  00:00  98.9734  99.6704 98.27640 99.67040 35438
3:      AAK     20051003  00:00  99.3219 100.3674 97.57941 97.57941  6600
4:      AAK     20051004  00:00  98.2764  98.2764 97.92791 98.27640 31564
5:      AAK     20051005  00:00  98.2764  99.3219 98.27640 99.32190  3730
6:      AAK     20051006  00:00  99.3219  99.3219 98.27640 98.27640 10187

0 个答案:

没有答案