这是交易的链接。 这是餐厅支票取消交易的情况。
我希望R检查项目是否具有标记“ U”,然后删除U和1个不是u的类似项目。
我已将要删除的项目标记为黄色。
chk_num dtl_name Duration Guest void_type Item_ttl
9707 Americano 45 1 18
9707 Americano 45 1 18
9707 Breakfast Tea 45 1 18
9707 Breakfast Tea 45 1 U -18
9707 Café Latte 45 1 21
9707 Camomille Tea 45 1 U -18
9707 Camomille Tea 45 1 18
9707 Earl Grey Tea 45 1 U -18
9707 Earl Grey Tea 45 1 18
9707 Fresh Mint Tea 45 1 U -18
9707 Fresh Mint Tea 45 1 18
9707 Green Tea 45 1 18
9707 Green Tea 45 1 U -18
9707 Green Tea 45 1 18
9707 Lemon Tea 45 1 18
9707 Lemon Tea 45 1 U -18
9707 Orange Juice 45 1 24
9707 Pepper Mint Tea 45 1 18
9707 Pepper Mint Tea 45 1 U -18
答案 0 :(得分:1)
使用data.table软件包的替代解决方案:
# load the 'data.table'-package & convert 'DF' to a data.table
library(data.table)
setDT(DF)
# add a rownumber
DF[ , rn := .I][]
# create a subset with only the 'U'-rows and make 'Item_ttl' positive
DF_U <- DF[void_type == "U"][, Item_ttl := Item_ttl * -1][]
# create an index of rownumbers to be removed by:
# - extracting 'rn' from 'DF_U'
# - joining DF_U with DF
# select only the first matching row in the join
# and then extract 'rn'
# - concatenate these two vectors into one
ix <- c(DF_U$rn, DF[DF_U, on = .(chk_num,dtl_name,Duration,Guest,Item_ttl), mult = "first"]$rn)
现在,您可以使用以下方法获得所需的最终结果:
DF[!ix]
给出:
chk_num dtl_name Duration Guest void_type Item_ttl rn 1: 9707 Americano 45 1 <NA> 18 1 2: 9707 Americano 45 1 <NA> 18 2 3: 9707 Café-Latte 45 1 <NA> 21 5 4: 9707 Green-Tea 45 1 <NA> 18 14 5: 9707 Orange-Juice 45 1 <NA> 24 17
答案 1 :(得分:0)
我很确定有更好的方法来做到这一点。
数据:
df1<-
data.table::fread("chk_num dtl_name Duration Guest void_type Item_ttl
9707 Americano 45 1 NA 18
9707 Americano 45 1 NA 18
9707 Breakfast-Tea 45 1 NA 18
9707 Breakfast-Tea 45 1 U -18
9707 Café-Latte 45 1 NA 21
9707 Camomille-Tea 45 1 U -18
9707 Camomille-Tea 45 1 NA 18
9707 Earl-Grey-Tea 45 1 U -18
9707 Earl-Grey-Tea 45 1 NA 18
9707 Fresh-Mint-Tea 45 1 U -18
9707 Fresh-Mint-Tea 45 1 NA 18
9707 Green-Tea 45 1 NA 18
9707 Green-Tea 45 1 U -18
9707 Green-Tea 45 1 NA 18
9707 Lemon-Tea 45 1 NA 18
9707 Lemon-Tea 45 1 U -18
9707 Orange-Juice 45 1 NA 24
9707 Pepper-Mint-Tea 45 1 NA 18
9707 Pepper-Mint-Tea 45 1 U -18") %>% setDF
代码:
fun1 <- function(x) {
while("U" %in% x$void_type) {
flagU <- min(which(x$void_type == "U"))
delFlagU <- min(which(x$Item_ttl == -x$Item_ttl[flagU]))
x <- x[-c(flagU,delFlagU),]
if(!("U" %in% x$void_type)) {return(x)}
}
return(x)
}
df1 %>% dplyr::group_by(dtl_name, Duration, Guest) %>% dplyr::do(.,fun1(.))
结果:
# A tibble: 5 x 6
# Groups: dtl_name, Duration, Guest [4]
# chk_num dtl_name Duration Guest void_type Item_ttl
# <int> <chr> <int> <int> <chr> <int>
#1 9707 Americano 45 1 <NA> 18
#2 9707 Americano 45 1 <NA> 18
#3 9707 Café-Latte 45 1 <NA> 21
#4 9707 Green-Tea 45 1 <NA> 18
#5 9707 Orange-Juice 45 1 <NA> 24
请注意:
如果您有一个标记U,但没有相应的“对”,那么您将陷入无限的while循环中。
您可能想扩展一下我的答案。
当您对业务逻辑一无所知时,我对此一无所知。您可以调整分组dplyr::group_by(dtl_name, Duration, Guest)
。不确定Duration
的Esp。
如果您更像一个data.table
人:
data.table::setDT(df1)[, fun1(.SD), by = .(dtl_name, Duration, Guest)]