使用滚动条件按组运行代码以获取大数据

时间:2018-11-30 10:24:34

标签: r data.table

我有这段代码来获取热波(连续的日期(按日期),最高和最低温度更高或=第90个分位数),然后获取每个热波的平均温度。

library(data.table)
setDT(df1)
    df1[, hotday := +(df1$MAX>=(quantile(df1$MAX,.90, na.rm = T, type = 6)) & df1$MIN>=(quantile(df$MIN,.90, na.rm = T, type = 6)))
                  ] [, length := with(rle(hotday), rep(lengths,lengths)) # to calculate lenght so I can select consecutive days only
                     ] [hotday==0, length:=0][!!hotday, Highest_Mean := max(MEAN) , rleid(length)][] # to find the highest Mean temp for each consecutive group

现在,此代码适用于具有一个站的df(df1):

   head(df1)
      YEAR MONTH DAY   Date MEAN  TMAX  TMIN
    1 1965     1   1 1/1/1965   NA 27.0 17.0
    2 1965     1   2 1/2/1965 24.0 28.0 20.7
    3 1965     1   3 1/3/1965 19.9 23.7 16.2
    4 1965     1   4 1/4/1965 18.0 23.4 12.0
    5 1965     1   5 1/5/1965 19.7 24.0 14.0
    6 1965     1   6 1/6/1965 18.6 24.0 13.0



and would like to run it for a df (df2) that has about 1005 stations like this:

    head(df2) # an example 

   x    y   date            Tmaxq90 Tmax    Tminq85  Tmin
34.000  33.000  5/25/1998   295.887 296.857 295.016 296.765
34.000  33.000  5/26/1998   295.887 296.778 295.016 296.702
34.000  33.000  5/27/1998   295.887 296.442 295.016 297.233
34.000  33.000  5/28/1998   295.887 293.971 295.016 296.923
34.000  33.000  5/29/1998   295.887 294.018 295.016 293.871
34.000  33.000  5/30/1998   295.887 293.910 295.016 293.746
34.000  33.000  5/31/1998   295.887 298.767 295.016 300.565
35.125  33.000  5/1/1989    301.084 302.898 298.897 299.553
35.125  33.000  5/2/1989    301.084 302.903 298.897 299.801
35.125  33.000  5/3/1989    301.084 299.393 298.897 297.521
35.125  33.000  5/4/1989    301.084 301.485 298.897 299.998
35.125  33.000  5/5/1989    301.084 295.539 298.897 295.085
35.125  33.000  5/6/1989    301.084 292.740 298.897 292.282
35.125  33.000  5/7/1989    301.084 292.150 298.897 291.397
35.125  33.000  5/8/1989    301.084 293.541 298.897 292.617
35.125  33.000  5/9/1989    301.084 294.249 298.897 293.766
35.125  33.000  5/10/1989   301.084 293.966 298.897 293.470
35.125  33.000  5/11/1989   301.084 292.951 298.897 291.870
35.125  33.000  5/12/1989   301.084 294.441 298.897 293.631
35.125  33.000  5/13/1989   301.084 296.407 298.897 295.729
35.125  33.000  5/14/1989   301.084 303.836 298.897 299.863
35.125  33.000  5/15/1989   301.084 303.290 298.897 302.021
35.125  33.000  5/16/1989   301.084 305.929 298.897 302.519
35.125  33.000  5/17/1989   301.084 303.316 298.897 301.235
35.125  33.000  5/18/1989   301.084 299.501 298.897 298.803
35.125  33.000  5/19/1989   301.084 302.325 298.897 299.509
35.125  33.000  5/20/1989   301.084 302.769 298.897 302.178
35.125  33.000  5/21/1989   301.084 303.988 298.897 301.407
35.125  33.000  5/22/1989   301.084 300.546 298.897 299.280
35.125  33.000  5/23/1989   301.084 295.673 298.897 295.154
35.125  33.000  5/24/1989   301.084 296.452 298.897 295.916
35.125  33.000  5/25/1989   301.084 295.904 298.897 295.585
35.125  33.000  5/26/1989   301.084 295.532 298.897 294.625
35.250  33.000  5/23/1990   299.237 296.108 296.897 299.145
35.250  33.000  5/24/1990   299.237 296.298 296.897 299.140
35.250  33.000  5/25/1990   299.237 298.298 296.897 297.466
35.250  33.000  5/26/1990   299.237 300.516 296.897 299.670
35.250  33.000  5/27/1990   299.237 303.569 296.897 301.019
35.250  33.000  5/28/1990   299.237 301.090 296.897 300.419
35.250  33.000  5/29/1990   299.237 299.757 296.897 299.138
35.250  33.000  5/30/1990   299.237 300.233 296.897 299.204
35.250  33.000  5/31/1990   299.237 301.268 296.897 300.429
35.250  33.000  5/1/1991    299.237 291.746 296.897 291.408
35.250  33.000  5/2/1991    299.237 292.045 296.897 290.981
35.250  33.000  5/3/1991    299.237 293.270 296.897 292.417
35.250  33.000  5/4/1991    299.237 296.360 296.897 295.466
35.250  33.000  5/5/1991    299.237 300.263 296.897 298.036
35.250  33.000  5/6/1991    299.237 301.099 296.897 298.810
35.250  33.000  5/7/1991    299.237 298.764 296.897 297.550
35.250  33.000  5/8/1991    299.237 304.438 296.897 301.194
35.250  33.000  5/9/1991    299.237 295.781 296.897 299.455
35.250  33.000  5/10/1991   299.237 296.701 296.897 300.393
35.250  33.000  5/11/1991   299.237 297.779 296.897 299.184
35.500  33.000  5/12/1991   299.237 300.330 297.897 299.254
35.500  33.000  5/13/1991   299.237 299.714 297.897 299.074
35.500  33.000  5/14/1991   299.237 299.751 297.897 298.759
35.500  33.000  5/15/1991   299.237 304.703 297.897 301.440
35.500  33.000  5/16/1991   299.237 293.104 297.897 292.175
35.500  33.000  5/17/1991   299.237 293.706 297.897 293.228
35.500  33.000  5/18/1991   299.237 294.243 297.897 293.336
35.500  33.000  5/19/1991   299.237 296.941 297.897 296.350
35.500  33.000  5/20/1991   299.237 296.482 297.897 295.638
35.500  33.000  5/21/1991   299.237 292.741 297.897 292.402
35.500  33.000  5/22/1991   299.237 293.129 297.897 292.516
35.500  33.000  5/23/1991   299.237 294.424 297.897 293.436
35.500  33.000  5/24/1991   299.237 295.830 297.897 294.658
35.500  33.000  5/25/1991   299.237 298.135 297.897 296.773
35.500  33.000  5/26/1991   299.237 295.076 297.897 294.366
35.500  33.000  5/27/1991   299.237 292.145 297.897 291.631
35.500  33.000  5/28/1991   299.237 292.971 297.897 292.263

here作为CSV文件提供了数据示例

df2中的每个站都有自己的x和y(lon和lat),并且记录了10357天。 有没有一种方法可以为每个工作站运行代码

1 个答案:

答案 0 :(得分:1)

通常,如果您提供一个最小的示例(我们可以用来重现您的问题),则帮助会更容易。

对于此类问题,dplyr软件包可以提供帮助。假设您有一个station_id变量,该变量唯一地标识了以以下内容开头的电台

library(dplyr)
df2 %>%
   by_group(station_id) %>%
   ...  ## dplyr version of your old code