Question

我是堆栈溢出和R初学者的新手。

我想计算一个大数据集的回报，如下所示：

function initialize() {
  function calcRoute() {
    var directionsService = new google.maps.DirectionsService();
    var start = "52.4076963,-1.4853391999999985";
    var end = "52.6114729,-1.6812878000000637";
    var request = {
      origin: start,
      destination: end,
      travelMode: google.maps.TravelMode.DRIVING
    };
    directionsService.route(request, function(response, status) {
      if (status == google.maps.DirectionsStatus.OK) {
        var directionsDisplay = new google.maps.DirectionsRenderer();
        directionsDisplay.setPanel(document.getElementById('result'));
        directionsDisplay.setDirections(response);
      }
    });
  }

  calcRoute();
}
google.maps.event.addDomListener(window, "load", initialize);

这是一种<script src="https://maps.googleapis.com/maps/api/js"></script> <div id="result"></div>格式，比方说价格，列是公司，值是价格，实际数据集有更多列和行。我想建立一个新的DT来计算月收益，我知道您可以使用Date C1 C2 C3 31.01.1985 NA 47 NA 28.02.1985 NA 45 NA 29.03.1985 130 56 NA 30.04.1985 140 67 NA 31.05.1985 150 48 93 28.06.1985 160 79 96 31.07.1985 160 56 94 30.08.1985 160 77 93 30.09.1985 160 66 93 31.10.1985 160 44 93 29.11.1985 160 55 93函数来做到这一点。但是如何建立具有如此多列而又没有for循环的新数据表？

我想到了：

data.table

但这只是出于某种原因而给出：

diff()

谢谢。

Answer 1

获得该输出的原因是因为Prices[, names(Prices) != "Date"]返回了逻辑向量：

> Prices[, names(Prices) != "Date"]
[1] FALSE  TRUE  TRUE  TRUE

由于可以使用逻辑进行计算，因此也可以在逻辑向量上使用diff。然后将FALSE视为0，将TRUE视为1。所以基本上您在做diff(c(0,1,1,1))。

您想要的解决方案：

cols <- setdiff(names(Prices),"Date")

# option 1:
Prices[, paste0(cols,"_return") := lapply(.SD, function(x) (x - shift(x, fill = NA))/shift(x, fill = NA)), .SDcols = cols][]

# option 2:
Prices[, paste0(cols,"_return") := lapply(.SD, function(x) c(NA,diff(x))/shift(x, fill = NA)), .SDcols = cols][]

给出：

> Prices
          Date  C1 C2 C3  C1_return   C2_return   C3_return
 1: 1985-01-31  NA 47 NA         NA          NA          NA
 2: 1985-02-28  NA 45 NA         NA -0.04255319          NA
 3: 1985-03-29 130 56 NA         NA  0.24444444          NA
 4: 1985-04-30 140 67 NA 0.07692308  0.19642857          NA
 5: 1985-05-31 150 48 93 0.07142857 -0.28358209          NA
 6: 1985-06-28 160 79 96 0.06666667  0.64583333  0.03225806
 7: 1985-07-31 160 56 94 0.00000000 -0.29113924 -0.02083333
 8: 1985-08-30 160 77 93 0.00000000  0.37500000 -0.01063830
 9: 1985-09-30 160 66 93 0.00000000 -0.14285714  0.00000000
10: 1985-10-31 160 44 93 0.00000000 -0.33333333  0.00000000
11: 1985-11-29 160 55 93 0.00000000  0.25000000  0.00000000

如果要创建新的data.table，可以使用以下两个选项之一：

# option 1:
Returns <- Prices[, c(list(Date = Date), lapply(.SD, function(x) (x - shift(x, fill = NA))/shift(x, fill = NA))), .SDcols = cols]

# option 2:
Returns <- copy(Prices)
Returns[, (cols) := lapply(.SD, function(x) (x - shift(x, fill = NA))/shift(x, fill = NA)), .SDcols = cols]

使用的数据：

Prices <- fread("Date        C1  C2  C3
31.01.1985  NA  47  NA
28.02.1985  NA  45  NA
29.03.1985  130 56  NA
30.04.1985  140 67  NA
31.05.1985  150 48  93
28.06.1985  160 79  96
31.07.1985  160 56  94
30.08.1985  160 77  93
30.09.1985  160 66  93
31.10.1985  160 44  93
29.11.1985  160 55  93")[, Date := as.Date(Date, "%d.%m.%Y")]

Answer 2

我会写一个函数来处理单列值

pc.change <- function(x) {   
(c(x[2:length(x)], NA) - x)*100/x }

然后将其应用于所有值列的矩阵

d <- read.table(text = "Date        C1  C2  C3
31.01.1985  NA  47  NA
28.02.1985  NA  45  NA
29.03.1985  130 56  NA
30.04.1985  140 67  NA
31.05.1985  150 48  93
28.06.1985  160 79  96
31.07.1985  160 56  94
30.08.1985  160 77  93
30.09.1985  160 66  93
31.10.1985  160 44  93
29.11.1985  160 55  93", header = TRUE)

apply(as.matrix(d[,2:4]), 2, pc.change)

这给了我

            C1         C2        C3
[1,]       NA  -4.255319        NA
[2,]       NA  24.444444        NA
[3,] 7.692308  19.642857        NA
[4,] 7.142857 -28.358209        NA
[5,] 6.666667  64.583333  3.225806
[6,] 0.000000 -29.113924 -2.083333
[7,] 0.000000  37.500000 -1.063830
[8,] 0.000000 -14.285714  0.000000
[9,] 0.000000 -33.333333  0.000000
[10,] 0.000000  25.000000  0.000000
[11,]       NA         NA        NA

然后如果需要，可以将其转换为数据表

如何计算data.table中的收益？

2 个答案: