Question

我有一个简单的问题，我怎样才能同时使用epnum和id == B13639J2。

我想为行row number选择最大epnum。我需要检索id == 'B13639J2'，因为我需要对变量进行一些手动更改。

行id epnum start 95528 B13639J2 1 0 95529 B13639J2 2 860 95530 B13639J2 3 1110 95531 B13639J2 4 1155 95532 B13639J2 5 1440的最大dta[which(dta$id == 'B13639J2' & which.max(dta$epnum)), ]

dta = structure(list(id = c("B13639J1", "B13639J1", "B13639J1", "B13639J1", 
"B13639J1", "B13639J1", "B13639J1", "B13639J1", "B13639J2", "B13639J2", 
"B13639J2", "B13639J2", "B13639J2"), epnum = c(4, 5, 6, 7, 8, 
9, 10, 11, 1, 2, 3, 4, 5), start = c(420, 425, 435, 540, 570, 
1000, 1310, 1325, 0, 860, 1110, 1155, 1440)), .Names = c("id", 
"epnum", "start"), row.names = 95520:95532, class = "data.frame")

我想知道如何做一些像

这样的事情

$url = "http://example.com";

$ch = curl_init($url);
curl_setopt($ch, CURLOPT_URL, $url); 
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$output = curl_exec($ch);
curl_close($ch);
?>

<img src="data:image/png;base64,<?php echo base64_encode($output);?>">

最后，我需要删除发现的行。

感谢。

数据

{{1}}

Answer 1

如果我们使用数字索引（which / which.max），则选项slice来自dplyr。这里需要一个双slice。我们首先对'id'进行子集，即'B13639J2'，然后再次为'epnum'的max进行子集化值。

 library(dplyr)
 slice(dta, which(id=='B13639J2')) %>%
                   slice(which.max(epnum))
 #        id epnum start
 #1 B13639J2     5  1440

或者我们按'id'分组，arrange按降序排列'epnum'，filter分组带有指定'id'的第一行。

  dta1 <- dta %>% 
             group_by(id) %>% 
             arrange(desc(epnum)) %>%
             filter(id=='B13639J2', row_number()==1L)

如果我们想要从数据集中删除此行，则一个选项为anti_join与原始数据集。

  anti_join(dta, dta1)

或者通过更改filter选项可以完成此操作

  dta %>%
      group_by(id) %>% 
      arrange(desc(epnum)) %>%
      filter(!(id=='B13639J2' & row_number()==1L))

Answer 2

A roundabout base R way of doing this. Temporarily set a copy of all epnum values not in your desired group to NA, then run which.max and drop - the resulting row:

dta[-which.max(replace(dta$epnum, dta$id != "B13639J2", NA)),]

#            id epnum start
#95520 B13639J1     4   420
#95521 B13639J1     5   425
#95522 B13639J1     6   435
#95523 B13639J1     7   540
#95524 B13639J1     8   570
#95525 B13639J1     9  1000
#95526 B13639J1    10  1310
#95527 B13639J1    11  1325
#95528 B13639J2     1     0
#95529 B13639J2     2   860
#95530 B13639J2     3  1110
#95531 B13639J2     4  1155

This is due to which.max skipping all NA or NaN values automatically:

which.max(c(NA,1,NaN,2,3))
#[1] 5

This doesn't change the row order of the dataset or drop any rownames info, and runs quite quickly (about 3s to process a 10M row file over here).

Answer 3

让我跳进另一个可能的解决方案。让我知道你的想法。

首先，我为每个变量创建select customernumber, year, value from mytable group by customernumber, year, value order by year desc

的max

epnum

然后，我dta = dta %>% group_by(id) %>% mutate(max = n())条件

R - 哪个和哪个.max融合

3 个答案: