有没有什么方法可以使用切割或使用R的任何其他功能执行总和而不是计数

时间:2015-12-02 07:20:45

标签: r

通过参考以下链接,我可以使用cut找到列的频率计数。

reference link

能够使用上面的链接获得输出

firstwins=0;
 wins=0;
 lost=0;
 for i=1:10000
    reset =0; %reset
    roll =1; %number of rolls in a game
    chck=zeros(12); %array of zeros
    while reset==0 %game is not over yet
         a=randi(6); %roll a die
         b=randi(6); %roll a die
         c=a+b; %two die tossed
        if roll==1 %if this is the first roll of the game
            if c==7 || c==11 %if the sum equals 7 or 11
                wins=wins+1; %player wins so increment win
                 firstwins=firstwins+1; %#of times won by rolling only once
                 reset=1; %reset
            elseif c==2 || c==3 || c==12 %if the sum is either 2,3, or 12
                 lost=lost+1; %player loses
                 reset=1; %reset
            else %if the sum is neither 2,3,7,11, nor 12
                 roll=roll+1; %increment #of times die was tossed in a game
                 chck(c)=1; %store the sum
            end;
        else %if this is a reroll
            if c==7 %if the rerolled sum == 7
                lost=lost+1; %player loses
                reset=1; %reset
            elseif chck(c)==1 %if initial outcome occurred
                wins=wins+1; %player wins
                reset=1; %reset
            else %neither 7 or the initial outcome
                roll=roll+1; %increment the number of rolls in one game
            end;
        end;
    end;  
 end;
 prob=firstwins/10000;

但我需要在那个小时内记录列值的总和。例如,我有一列 value_column ,一天中的时间间隔不同。如何对这些值求和并将其显示在单独的列中。

var1                 freq
2015-10-01 10:00:00  10

预期输出

value_column  date_time
14            10/1/2015 10:00
10            10/1/2015 10:02
16            10/1/2015 10:03
9             10/1/2015 10:04
1             10/1/2015 10:05
5             10/1/2015 10:06
13            10/1/2015 10:07
21            10/1/2015 10:08
18            10/1/2015 10:09
16            10/1/2015 10:10

提前致谢。

3 个答案:

答案 0 :(得分:3)

我们可以将'date_time'列转换为POSIXct类,使用00将分钟部分替换为format,将该变量分组并获取sum' value_column'与summarise

 library(dplyr)
 df1 %>%
     group_by(date_time = format(as.POSIXct(date_time, 
                           format='%m/%d/%Y %H:%M'), '%m/%d/%Y %H:00')) %>% 
     summarise(sum_value_column = sum(value_column))
#            date_time sum_value_column
#            (chr)            (int)
#1 10/01/2015 10:00              123

数据

df1 <- structure(list(value_column = c(14L, 10L, 16L, 9L, 1L, 
5L, 13L, 
21L, 18L, 16L), date_time = c("10/1/2015 10:00", "10/1/2015 10:02", 
"10/1/2015 10:03", "10/1/2015 10:04", "10/1/2015 10:05",
"10/1/2015 10:06", 
"10/1/2015 10:07", "10/1/2015 10:08", "10/1/2015 10:09",
"10/1/2015 10:10")), .Names = c("value_column", "date_time"), 
 class = "data.frame", row.names = c(NA, -10L))

答案 1 :(得分:3)

对于SQL用户,假设输入是数据框data

library(sqldf)

sqldf("select substr(date_time, 1, instr(date_time, ':')) || '00' date_time, 
              sum(value_column)
       from data
       group by substr(date_time, 1, instr(date_time, ':')) || '00'")

或者,我们可以将复杂表达式分解为嵌套的select语句,如下所示:

sqldf("select date_time, 
              sum(value_column)
       from (select substr(date_time, 1, instr(date_time, ':')) || '00' date_time,
                    value_column
             from data)
       group by date_time")

答案 2 :(得分:1)

我可能会尝试:

df1$date_time <- as.character(df1$date_time, stirngAsFactors = F) 

df1$date <- str_split_fixed(df1$date_time, " ")[,1] 

df1$date <- as.Date(df1$date, "%d/%m/%Y") 

df1$time <- str_split_fixed(df1$date_time, " ")[,2]

total_table <- aggregate(df1$value_column, by = list(df1$date, df1$time), FUN =sum)

可能这有点大,但我可以同时使用日期和时间进行进一步分析。