我有一个简单的问题,我怎样才能同时使用epnum
和id == B13639J2
。
我想为行row number
选择最大epnum
。
我需要检索id == 'B13639J2'
,因为我需要对变量进行一些手动更改。
行 id epnum start
95528 B13639J2 1 0
95529 B13639J2 2 860
95530 B13639J2 3 1110
95531 B13639J2 4 1155
95532 B13639J2 5 1440
的最大dta[which(dta$id == 'B13639J2' & which.max(dta$epnum)), ]
dta = structure(list(id = c("B13639J1", "B13639J1", "B13639J1", "B13639J1",
"B13639J1", "B13639J1", "B13639J1", "B13639J1", "B13639J2", "B13639J2",
"B13639J2", "B13639J2", "B13639J2"), epnum = c(4, 5, 6, 7, 8,
9, 10, 11, 1, 2, 3, 4, 5), start = c(420, 425, 435, 540, 570,
1000, 1310, 1325, 0, 860, 1110, 1155, 1440)), .Names = c("id",
"epnum", "start"), row.names = 95520:95532, class = "data.frame")
我想知道如何做一些像
这样的事情$url = "http://example.com";
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$output = curl_exec($ch);
curl_close($ch);
?>
<img src="data:image/png;base64,<?php echo base64_encode($output);?>">
最后,我需要删除发现的行。
感谢。
数据
{{1}}
答案 0 :(得分:8)
如果我们使用数字索引(which
/ which.max
),则选项slice
来自dplyr
。这里需要一个双slice
。我们首先对'id'进行子集,即'B13639J2',然后再次为'epnum'的max
进行子集化
值。
library(dplyr)
slice(dta, which(id=='B13639J2')) %>%
slice(which.max(epnum))
# id epnum start
#1 B13639J2 5 1440
或者我们按'id'分组,arrange
按降序排列'epnum',filter
分组带有指定'id'的第一行。
dta1 <- dta %>%
group_by(id) %>%
arrange(desc(epnum)) %>%
filter(id=='B13639J2', row_number()==1L)
如果我们想要从数据集中删除此行,则一个选项为anti_join
与原始数据集。
anti_join(dta, dta1)
或者通过更改filter
选项可以完成此操作
dta %>%
group_by(id) %>%
arrange(desc(epnum)) %>%
filter(!(id=='B13639J2' & row_number()==1L))
答案 1 :(得分:2)
A roundabout base R way of doing this. Temporarily set a copy of all epnum
values not in your desired group to NA
, then run which.max
and drop -
the resulting row:
dta[-which.max(replace(dta$epnum, dta$id != "B13639J2", NA)),]
# id epnum start
#95520 B13639J1 4 420
#95521 B13639J1 5 425
#95522 B13639J1 6 435
#95523 B13639J1 7 540
#95524 B13639J1 8 570
#95525 B13639J1 9 1000
#95526 B13639J1 10 1310
#95527 B13639J1 11 1325
#95528 B13639J2 1 0
#95529 B13639J2 2 860
#95530 B13639J2 3 1110
#95531 B13639J2 4 1155
This is due to which.max
skipping all NA
or NaN
values automatically:
which.max(c(NA,1,NaN,2,3))
#[1] 5
This doesn't change the row order of the dataset or drop any rownames
info, and runs quite quickly (about 3s to process a 10M row file over here).
答案 2 :(得分:0)
让我跳进另一个可能的解决方案。 让我知道你的想法。
首先,我为每个变量创建select customernumber, year, value from mytable
group by customernumber, year, value
order by year desc
max
epnum
然后,我dta = dta %>%
group_by(id) %>%
mutate(max = n())
条件
!