从特定值出现时开始对组中的行求和

时间:2015-07-24 19:09:08

标签: r data-manipulation

我希望累积列的值直到组的末尾,但是当另一列中出现特定值时开始添加。我只对组中特定值的第一个实例感兴趣。因此,如果该值在组内再次出现,则添加列应继续添加值。我知道这听起来像一个相当奇怪的问题,所以希望示例表有意义。

我现在拥有以下数据框架:

> df = data.frame(group = c(1,1,1,1,2,2,2,2,2,3,3,3,4,4,4),numToAdd = c(1,1,3,2,4,2,1,3,2,1,2,1,2,3,2),occurs = c(0,0,1,0,0,1,0,0,0,0,1,1,0,0,0))

> df
   group numToAdd occurs
1      1        1      0
2      1        1      0
3      1        3      1
4      1        2      0
5      2        4      0
6      2        2      1
7      2        1      0
8      2        3      0
9      2        2      0
10     3        1      0
11     3        2      1
12     3        1      1
13     4        2      0
14     4        3      0
15     4        2      0

因此,每当组中出现1时,我想要列numToAdd列的值的累积和,直到新组开始。这看起来如下:

> finalDF = data.frame(group = c(1,1,1,1,2,2,2,2,2,3,3,3,4,4,4),numToAdd =    c(1,1,3,2,4,2,1,3,2,1,2,1,2,3,2),occurs = c(0,0,1,0,0,1,0,0,0,0,1,1,0,0,0),added = c(0,0,3,5,0,2,3,6,8,0,2,3,0,0,0))

> finalDF
   group numToAdd occurs added
1      1        1      0     0
2      1        1      0     0
3      1        3      1     3
4      1        2      0     5
5      2        4      0     0
6      2        2      1     2
7      2        1      0     3
8      2        3      0     6
9      2        2      0     8
10     3        1      0     0
11     3        2      1     2
12     3        1      1     3
13     4        2      0     0
14     4        3      0     0
15     4        2      0     0

因此,添加的列为0,直到组中出现1,然后累积numToAdd中的值,直到它移动到新组,将添加的列恢复为0.在第3组中,找到值1第二次,累积的金额继续。此外,在组4中,永远不会找到值1,因此添加列中的值保持为0。

我玩过dplyr,但无法让它发挥作用。以下解决方案仅输出总和,而不是每行的累计累计数。

library(dplyr)
df = 
  df  %>%
  mutate(added=ifelse(occurs == 1,cumsum(numToAdd),0)) %>%
  group_by(group) 

4 个答案:

答案 0 :(得分:3)

尝试

 df %>% 
    group_by(group) %>%
    mutate(added= cumsum(numToAdd*cummax(occurs)))
 #      group numToAdd occurs added
 # 1      1        1      0     0
 # 2      1        1      0     0
 # 3      1        3      1     3
 # 4      1        2      0     5
 # 5      2        4      0     0
 # 6      2        2      1     2
 # 7      2        1      0     3
 # 8      2        3      0     6
 # 9      2        2      0     8
 # 10     3        1      0     0
 # 11     3        2      1     2
 # 12     3        1      1     3
 # 13     4        2      0     0
 # 14     4        3      0     0
 # 15     4        2      0     0

或使用data.table

 library(data.table)#v1.9.5+
 i1 <-setDT(df)[, .I[(rleid(occurs) + (occurs>0))>1], group]$V1
 df[, added:=0][i1, added:=cumsum(numToAdd), by = group]

或与dplyr

中类似的选项
 setDT(df)[,added := cumsum(numToAdd * cummax(occurs)) , by = group]

答案 1 :(得分:2)

你可以在基础R中使用split-apply-combine,例如:

df$added <- unlist(lapply(split(df, df$group), function(x) {
  y <- rep(0, nrow(x))
  pos <- cumsum(x$occurs) > 0
  y[pos] <- cumsum(x$numToAdd[pos])
  y
}))
df
#    group numToAdd occurs added
# 1      1        1      0     0
# 2      1        1      0     0
# 3      1        3      1     3
# 4      1        2      0     5
# 5      2        4      0     0
# 6      2        2      1     2
# 7      2        1      0     3
# 8      2        3      0     6
# 9      2        2      0     8
# 10     3        1      0     0
# 11     3        2      1     2
# 12     3        1      1     3
# 13     4        2      0     0
# 14     4        3      0     0
# 15     4        2      0     0

答案 2 :(得分:2)

添加另一个public function curlMulti($data) { $multi = curl_multi_init(); $channels = array(); // Loop through the array, create curl-handles // and attach the handles to our multi-request foreach ($data as $oneData) { $ch = curl_init(); curl_setopt($ch, CURLOPT_HTTPHEADER, array( 'Authorization: Token token="' . $this->_api_token . '"' )); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_HEADER, false); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); curl_setopt($ch,CURLOPT_POST, 1); $oneData = json_encode($oneData); curl_setopt($ch,CURLOPT_POSTFIELDS, $oneData); curl_multi_add_handle($multi, $ch); $channels[] = $ch; } // While we're still active, execute curl $active = null; do { $mrc = curl_multi_exec($multi, $active); } while ($mrc == CURLM_CALL_MULTI_PERFORM); while ($active && $mrc == CURLM_OK) { // Wait for activity on any curl-connection if (curl_multi_select($multi) == -1) { continue; } // Continue to exec until curl is ready to // give us more data do { $mrc = curl_multi_exec($multi, $active); } while ($mrc == CURLM_CALL_MULTI_PERFORM); } // Loop through the channels and retrieve the received // content, then remove the handle from the multi-handle foreach ($channels as $channel) { $response[] = curl_multi_getcontent($channel); curl_multi_remove_handle($multi, $channel); } foreach($response as $key => $oneResponse) { $response[$key] = json_decode($oneResponse, true); } // Close the multi-handle and return our results curl_multi_close($multi); return $response; } 方法:

base R

答案 3 :(得分:1)

另一个基础R:

driver.findElement(By.xpath(".//*[@id='flight-departing']")).sendKeys("26/07/2015").click();