R:按列平滑df值

时间:2018-03-06 14:45:20

标签: r dataframe smoothing

我有一个df,我想变成折线图。 如下所示,图表有很多峰值,所以我决定应该平滑每个变量的值。

有没有办法在不使用循环的情况下用20行平均值替换df的每个单元?

示例数据:

df = structure(list(Date = c(0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 
100, 110, 120, 130, 140, 150, 160, 170, 180, 190), `0_3` = c(2.96069423175089, 
2.98934234468417, 3.0166710770045, 2.93318848451928, 2.9029582526956, 
2.93122886133033, 2.95467584624211, 2.92056074766355, 2.9673590504451, 
2.99909118448955, 3.0678648899907, 3.08758664146188, 3.16639741518578, 
3.1981536432575, 3.23886639676113, 3.32871012482663, 3.2554847841472, 
3.33575054387237, 3.25720703856234, 3.28495034377387), `0_6` = c(2.65441551812149, 
2.70340525084481, 2.75205080709182, 2.71591526344378, 2.76472214542438, 
2.73393461104848, 2.75387263339071, 2.77453271028037, 2.7299703264095, 
2.66585883065738, 2.69600247908274, 2.67800882167612, 2.7140549273021, 
2.63765248928454, 2.69905533063428, 2.66990291262136, 2.689313517339, 
2.75562001450326, 2.77049794084613, 2.78838808250573)), .Names = c("Date", 
"0_3", "0_6"), row.names = c(NA, 20L), class = "data.frame")

到目前为止,我只能用循环来平滑它:

smooth_factor = 5
smooth_df = df[smooth_factor:nrow(df),]
for (i in rownames(smooth_df)) {
  i = as.numeric(i)
  for (j in colnames(smooth_df)[2:ncol(smooth_df)]){
  # The first column contains Date that should not be smoothed
    smooth_percent[i,j] = mean(df[(i-smooth_factor):i,j])
  }
}
smooth_df$Date = df$Date

如果我将此方法应用于更大的数据集,这就是它的样子:

Noisy data with a lot of spikes

变成

Data set smoothed by 20 steps

1 个答案:

答案 0 :(得分:2)

您正在寻找申请(基础)和rollmean(来自动物园动物园)

 library(zoo)
 npoints <- 5
 apply(df,2,function(x){rollmean(x,npoints)})

      Date      0_3      0_6
 [1,]   20 2.960571 2.718102
 [2,]   30 2.954678 2.734006
 [3,]   40 2.947745 2.744099
 [4,]   50 2.928522 2.748595
 [5,]   60 2.935357 2.751406
 [6,]   70 2.954583 2.731634
 [7,]   80 2.981910 2.724047
 [8,]   90 3.008493 2.708875
 [9,]  100 3.057660 2.696779
[10,]  110 3.103819 2.678316
[11,]  120 3.151774 2.684955
[12,]  130 3.203943 2.679735
[13,]  140 3.237522 2.681996
[14,]  150 3.271393 2.690309
[15,]  160 3.283204 2.716878
[16,]  170 3.292421 2.734744

使用data.table,它将是:

df[, lapply(.SD,function(x){rollmean(x,npoints)}),.SDcols = names(df)]