我有一个像下面这样的数据集
ID. Invoice. Date of Invoice. paid or not.
1 1 10/31/2019 yes
1 1 10/31/2019 yes
1 2 11/30/2019 no
1 3 12/31/2019 no
2 1 09/30/2019 no
2 2 10/30/2019 no
2 3 11/30/2019 yes
3 1 7/31/2019 no
3 2 9/30/2019 yes
3 3 12/31/2019 no
4 1 7/31/2019 yes
4 2 9/30/2019 no
4 3 12/31/2019 yes
我想知道客户是否愿意付款。只要客户支付了新发票而未支付的旧发票,我就会给他一个很好的分数。因此对于客户1和客户3,我给的评价是“好”,客户2的评价是“差”。
因此最终数据将再增加一列,其值为好和坏。
ID。发票。发票日期。是否付款。好不好
1 1 10/31/2019 yes bad
1 1 10/31/2019 yes bad
1 2 11/30/2019 no bad
1 3 12/31/2019 no bad
2 1 09/30/2019 no good
2 2 10/30/2019 no good
2 3 11/30/2019 yes good
3 1 7/31/2019 no good
3 2 9/30/2019 yes good
3 3 12/31/2019 no good
4 1 7/31/2019 yes good
4 2 9/30/2019 no good
4 3 12/31/2019 yes good
答案 0 :(得分:2)
不清楚逻辑。可能是,我们可以按“ ID”分组后在第一行以外的任何行中检查“是”
library(dplyr)
library(lubridate)
df1 %>%
mutate(Date_of_Invoice = mdy(Date_of_Invoice)) %>%
arrange(ID, Date_of_Invoice) %>%
group_by(ID) %>%
mutate(flag = c('bad', 'good')[1 + any(paid_or_not[-1] == "yes")])
# A tibble: 9 x 5
# Groups: ID [3]
# ID Invoice Date_of_Invoice paid_or_not flag
# <int> <int> <date> <chr> <chr>
#1 1 1 2019-09-30 no good
#2 1 2 2019-10-30 no good
#3 1 3 2019-11-30 yes good
#4 2 1 2019-10-31 yes bad
#5 2 2 2019-11-30 no bad
#6 2 3 2019-12-31 no bad
#7 3 1 2019-07-31 no good
#8 3 2 2019-09-30 yes good
#9 3 3 2019-12-31 no good
df1 <- structure(list(ID = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), Invoice = c(1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), Date_of_Invoice = c("09/30/2019",
"10/30/2019", "11/30/2019", "10/31/2019", "11/30/2019", "12/31/2019",
"7/31/2019", "9/30/2019", "12/31/2019"), paid_or_not = c("no",
"no", "yes", "yes", "no", "no", "no", "yes", "no")), class = "data.frame", row.names = c(NA,
-9L))
答案 1 :(得分:2)
假设您的Date of Invoice.
已订购,那么这里是使用ave
的基本R解决方案
df$`good or band.` <- ave(df$`paid or not.`,df$ID., FUN = function(v) ifelse(which(v=="yes")==1,"bad","good"))
这样
> df
ID. Invoice. Date of Invoice. paid or not. good or band.
1 1 1 09/30/2019 no good
2 1 2 10/30/2019 no good
3 1 3 11/30/2019 yes good
4 2 1 10/31/2019 yes bad
5 2 2 11/30/2019 no bad
6 2 3 12/31/2019 no bad
7 3 1 7/31/2019 no good
8 3 2 9/30/2019 yes good
9 3 3 12/31/2019 no good
数据
df <- structure(list(ID. = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), Invoice. = c(1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), `Date of Invoice.` = c("09/30/2019",
"10/30/2019", "11/30/2019", "10/31/2019", "11/30/2019", "12/31/2019",
"7/31/2019", "9/30/2019", "12/31/2019"), `paid or not.` = c("no",
"no", "yes", "yes", "no", "no", "no", "yes", "no")), class = "data.frame", row.names = c(NA,
-9L))