使用distinct来删除组合数据集中的重复项,但是我会丢失数据,因为distinct仅保留第一个条目。
示例数据帧“ a”
TRIGGER `Customer_Details`.`Client_Account_Payback_AFTER_INSERT` AFTER INSERT ON `Client_Account_Payback` FOR EACH ROW
BEGIN
declare amountPaid float;
declare amountRamianing float;
declare loanAmount float;
select new.Client_Account_Payback_Amount into amountPaid;
select Client_Account_Borrow_Amount_Remaining
from Client_Account_Borrow
where Client_Account_Borrow_ID = new.Client_Account_Payback_Loan_Borrowed_ID
into amountRamianing;
select Client_Account_Amount_Borrowed
from Client_Account_Borrow
where Client_Account_Borrow_ID = new.Client_Account_Payback_Loan_Borrowed_ID
into loanAmount;
set amountRamianing = amountRamianing + amountPaid;
IF amountRamianing > loanAmount THEN
UPDATE `Client_Account_Borrow`
SET `Client_Account_Borrow_Amount_Remaining` = amountRamianing
WHERE `Client_Account_Borrow_ID` = new.Client_Account_Payback_Loan_Borrowed_ID;
ELSE
UPDATE `Client_Account_Borrow`
SET `Client_Account_Borrow_Amount_Remaining` = amountRamianing,
`Client_Account_Borrow_Paid_Back` = true
WHERE `Client_Account_Borrow_ID` = new.Client_Account_Payback_Loan_Borrowed_ID;
END IF;
END
编码:
SiteID PYear Habitat num.1
000901W 2011 W NA
001101W 2007 W NA
001801W 2005 W NA
002001W 2017 W NA
002401F 2006 F NA
002401F 2016 F NA
004001F 2006 F NA
004001W 2006 W NA
004101W 2007 W NA
004101W 2007 W 16
004701F 2017 F NA
006201F 2008 F NA
006501F 2009 F NA
006601W 2007 W 2
006601W 2007 W NA
006803F 2009 F NA
007310F 2018 F NA
007602W 2017 W NA
008103W 2011 W NA
008203F 2007 F 1
我想知道如何根据SiteID和num.1删除重复项,但是我不想摆脱num.1列中具有数字值的重复项。例如,在数据帧中,004101W和006601W有多个条目,但是我想保留整数而不是NA。
答案 0 :(得分:0)
(感谢您使用更多具有代表性的示例数据进行更新!)
a
现在有20行,具有17个不同的SiteID
值。
这些SiteID
中的三个有多行:
library(tidyverse)
a %>%
add_count(SiteID) %>%
filter(n > 1)
## A tibble: 6 x 5
# SiteID PYear Habitat num.1 n
# <chr> <int> <chr> <int> <int>
#1 002401F 2006 F NA 2 # Both have NA for num.1
#2 002401F 2016 F NA 2 # ""
#3 004101W 2007 W NA 2 # Drop
#4 004101W 2007 W 16 2 # Keep this one
#5 006601W 2007 W 2 2 # Keep this one
#6 006601W 2007 W NA 2 # Drop
如果我们想对num.1
中没有NA的行进行优先级排序,我们可以在每个SiteID中以arrange
的数量加1,这样,对于每个SiteID,NA都排在最后,distinct
函数将使用非NA值对数字1进行优先级排序。
(如果您想保留a
中的原始排序,但仍将编号1中的NA值移到末尾,则也提供了另一种选择。在is.na(num.1)
项中,NA将评估为TRUE,并紧随提供的值之后,该值的值为FALSE。)
a %>%
arrange(SiteID, num.1) %>%
#arrange(SiteID, is.na(num.1)) %>% # Alternative to preserve orig order
distinct(SiteID, .keep_all = TRUE)
SiteID PYear Habitat num.1
1 000901W 2011 W NA
2 001101W 2007 W NA
3 001801W 2005 W NA
4 002001W 2017 W NA
5 002401F 2006 F NA # Kept first appearing row, since both NA num.1
6 004001F 2006 F NA
7 004001W 2006 W NA
8 004101W 2007 W 16 # Kept non-NA row
9 004701F 2017 F NA
10 006201F 2008 F NA
11 006501F 2009 F NA
12 006601W 2007 W 2 # Kept non-NA row
13 006803F 2009 F NA
14 007310F 2018 F NA
15 007602W 2017 W NA
16 008103W 2011 W NA
17 008203F 2007 F 1
a <- read.table(header = T, stringsAsFactors = F,
text = " SiteID PYear Habitat num.1
000901W 2011 W NA
001101W 2007 W NA
001801W 2005 W NA
002001W 2017 W NA
002401F 2006 F NA
002401F 2016 F NA
004001F 2006 F NA
004001W 2006 W NA
004101W 2007 W NA
004101W 2007 W 16
004701F 2017 F NA
006201F 2008 F NA
006501F 2009 F NA
006601W 2007 W 2
006601W 2007 W NA
006803F 2009 F NA
007310F 2018 F NA
007602W 2017 W NA
008103W 2011 W NA
008203F 2007 F 1")