如何以逐行方式使用给定数据创建数据框

时间:2015-12-16 09:23:49

标签: r dataframe

我有一个看起来像这样的数据框

  Customer     Mon1     Mon2     Mon3    Mon4     Mon5     Mon6     Mon7     Mon8     Mon9    Mon10    Mon11    Mon12
1  BS:100021  83.6140  76.7849  71.8369 66.8452  66.9263  53.5129  48.0321  44.5750  34.4080  40.8653  40.6369  24.8658
2  BS:100022  -0.5097   0.9198  -1.6027 -0.7160 -40.7443 -40.8863 -40.7049 -40.4623 -48.8805 -44.8879 -48.3559 -39.8656
3  BS:100025 106.4243 102.3998 100.9183 99.1006  98.1092  95.8125  94.2770  95.9911  92.1445  94.0984  87.7465  86.1946
4  BS:100037  37.5871  37.5888  37.5905 37.5924  37.5941  37.5957  37.5977  37.5993  37.5797  50.8395  50.8416  37.6064
5  BS:100050   0.0000   0.0000   0.0000  0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000
6  BS:100056 780.4214  88.0918  88.9721 88.8143  90.9508  97.9842  96.6309 101.7312  84.7743  76.5239 133.6655  86.3668
7  BS:100063  15.1694  15.1993  17.6528 19.6854  23.9929  27.5048  18.1503  19.8184  17.3152  17.3084  18.4588  24.0067
8  BS:100079   0.0292   0.0827   0.3120  0.1206   1.6245   2.3239   2.5857   0.1718   0.4340   0.6849   3.2916   2.2456
9  BS:100089   0.0000   0.0000   0.0000  0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000
10 BS:100091   0.1324   2.8137   0.9854  0.2405   0.2811   0.1312   0.0174   2.4304   0.7994   0.5884   0.2618   0.1233

现在,我想为未来12个月的每位客户创建另一个数据框 我不知道如何使用接下来的12个月数据创建另一个数据框。例如,我尝试过

make_new_data <- function(x){
  require(forecast)
  ts_object <- as.numeric(x[-1])
  forecasted_data <- data.frame(naive(ts_object,12))[,1]
  new_data <- c(x[1],as.character(forecasted_data))
  return(new_data)
}
z <- apply(test_data_entry,1,make_new_data)

这显然无效c(x[1],as.character(forecasted_data)) 没有制作像。的载体。

1 BS:100021 83.614 76.7849 71.8369 66.8452 66.9263 53.5129 48.0321 44.575 34.408 40.8653 40.6369 24.8658

我如何让它工作,其次是有办法以更快的方式做到这一点

1 个答案:

答案 0 :(得分:2)

你有宽格式的数据,快速(我也会说更好)的方式是使用长格式。我的方法使用data.table。

首先,您需要使用melt - 包的reshape2 - 函数将数据从宽变换为长。鉴于您的数据是什么样的,它应该是这样的(假设您的数据被称为test_data_entry):

library(data.table)
library(reshape2)
dt.wide <- as.data.table(test_data_entry)
dt.long <- melt(dt.wide, id.vars = "Customer")
# doing some minor changes
dt <- dt.long[, .(Customer, 
       Month = as.numeric(gsub("Mon", "", variable)),
       value)] # replace the MonX with X etc.

dt[, forecasted.value := as.data.frame(naive(value, 12))[,1], 
   by = Customer]
dt[order(Customer)]
#    Customer  Month   value forecasted.value
# 1: BS:100021     1 83.6140          24.8658
# 2: BS:100021     2 76.7849          24.8658
# 3: BS:100021     3 71.8369          24.8658
# 4: BS:100021     4 66.8452          24.8658
# 5: BS:100021     5 66.9263          24.8658
# ---                                         
# 116: BS:100091     8  2.4304           0.1233
# 117: BS:100091     9  0.7994           0.1233
# 118: BS:100091    10  0.5884           0.1233
# 119: BS:100091    11  0.2618           0.1233
# 120: BS:100091    12  0.1233           0.1233

或者,坚持你的方法:

你是对的,as.character混乱。这应该解决它:

library(forecast)

make_new_data <- function(x){
  ts_object <- as.numeric(x[-1])
  forecasted_data <- data.frame(naive(ts_object, 12))[, 1]

  new_data <- data.frame(as.character(x[1]), t(forecasted_data))
  names(new_data) <- c("Customer", paste0("Mon", 1:12))

  return(new_data)
}
z <- data.table::rbindlist(apply(test_data_entry,1,make_new_data))
z
# Customer     Mon1     Mon2     Mon3     Mon4     Mon5     Mon6         Mon7     Mon8     Mon9
# 1: BS:100021  24.8658  24.8658  24.8658  24.8658  24.8658  24.8658  24.8658  24.8658  24.8658
# 2: BS:100022 -39.8656 -39.8656 -39.8656 -39.8656 -39.8656 -39.8656 -39.8656 -39.8656 -39.8656
# 3: BS:100025  86.1946  86.1946  86.1946  86.1946  86.1946  86.1946  86.1946  86.1946  86.1946
# 4: BS:100037  37.6064  37.6064  37.6064  37.6064  37.6064  37.6064  37.6064  37.6064  37.6064
# 5: BS:100050   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000
# 6: BS:100056  86.3668  86.3668  86.3668  86.3668  86.3668  86.3668  86.3668  86.3668  86.3668
# 7: BS:100063  24.0067  24.0067  24.0067  24.0067  24.0067  24.0067  24.0067  24.0067  24.0067
# 8: BS:100079   2.2456   2.2456   2.2456   2.2456   2.2456   2.2456   2.2456   2.2456   2.2456
# 9: BS:100089   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000
# 10: BS:100091   0.1233   0.1233   0.1233   0.1233   0.1233   0.1233   0.1233   0.1233   0.1233
# Mon10    Mon11    Mon12
# 1:  24.8658  24.8658  24.8658
# 2: -39.8656 -39.8656 -39.8656
# 3:  86.1946  86.1946  86.1946
# 4:  37.6064  37.6064  37.6064
# 5:   0.0000   0.0000   0.0000
# 6:  86.3668  86.3668  86.3668
# 7:  24.0067  24.0067  24.0067
# 8:   2.2456   2.2456   2.2456
# 9:   0.0000   0.0000   0.0000
# 10:   0.1233   0.1233   0.1233