我的数据如下:
library(dplyr)
library(data.table)
df <- data.frame(
customernumber = c("111", "111", "111", "111", "111","222", "222", "222", "222", "222", "222", "222"),
ordernumber = c("1", "1", "1", "2", "2", "1", "1", "1", "1", "2", "2", "3"),
article = c("JeansA", "JeansA", "ShirtA", "JeansA", "JeansB", "ShirtA", "ShirtB", "ShirtB", "JeansA", "JeansB", "ShirtA", "JeansB"),
size = c("40", "42", "40", "42", "44", "36", "36", "40", "40", "38", "44", "36"),
returned = c("1", "1", "0", "0", "1", "1", "1", "0", "0", "0", "0", "0")
)
输出:
customernumber ordernumber article size returned
1 111 1 JeansA 40 1
2 111 1 JeansA 42 1
3 111 1 ShirtA 40 0
4 111 2 JeansA 42 0
5 111 2 JeansB 44 1
6 222 1 ShirtA 36 1
7 222 1 ShirtB 36 1
8 222 1 ShirtB 40 0
9 222 1 JeansA 40 0
10 222 2 JeansB 38 0
11 222 2 ShirtA 44 0
12 222 3 JeansB 36 0
现在我想标记每个客户的所有订单,其中已经退回了一篇文章,但是在下一个订单中以不同的尺寸再次订购。因此,所有只交换的文章因此不能真正被视为新订单。所以最终的结果应该是这样的:
结果:
customernumber ordernumber article size returned changed
1 111 1 JeansA 40 1 0
2 111 1 JeansA 42 1 0
3 111 1 ShirtA 40 0 0
4 111 2 JeansA 42 0 1
5 111 2 JeansB 44 1 0
6 222 1 ShirtA 36 1 0
7 222 1 ShirtB 36 1 0
8 222 1 ShirtB 40 0 0
9 222 1 JeansA 40 0 0
10 222 2 JeansB 38 0 0
11 222 2 ShirtA 44 0 1
12 222 3 JeansB 36 0 0
我认为我可以通过使用dyplr(或data.table)引入滞后变量来解决问题,但我只能设法在同一组内滞后变量,但我无法将其延迟到下一组。这是:
df %>%
group_by(customernumber, ordernumber, article) %>%
mutate(lag_size = lag(size, order_by = article))
或:
df <- data.table(df)
setorder(df, customernumber, ordernumber, article)
df[,lag_size := shift(size), by = .(customernumber, ordernumber, article)]
我不想考虑for循环(甚至不确定它是否会解决问题),因为数据集非常大并且需要年龄。而且我总体上缺乏想法。所以任何帮助都表示赞赏。
谢谢!
附加元件:
我偶然发现了与此案有关的另一个问题。我只想在下一个跟进订单中标记已在另一个尺寸中订购的文章,如果已更改,则不会在同一尺寸的同一文章再次订购时。所以变量的标准是:
订单n:返回== 1 订单n + 1:相同的文章,不同的尺寸 - &gt;更改== 1(否则更改== 0)
以下是更新的示例:
df <- data.frame(
customernumber = c("111", "111", "111", "111", "111", "111","222", "222", "222", "222", "222", "222", "222"),
ordernumber = c("1", "1", "1", "2", "2", "2", "1", "1", "1", "1", "2", "2", "3"),
article = c("JeansA", "JeansA", "ShirtA", "JeansA", "JeansA", "JeansB", "ShirtA", "ShirtB", "ShirtB", "JeansA", "JeansB", "ShirtA", "JeansB"),
size = c("40", "42", "40", "40", "44", "44", "36", "36", "40", "40", "38", "44", "36"),
returned = c("1", "1", "0", "0", "1", "1", "1", "1", "0", "0", "0", "0", "0")
)
输出:
customernumber ordernumber article size returned
1 111 1 JeansA 40 1
2 111 1 JeansA 42 1
3 111 1 ShirtA 40 0
4 111 2 JeansA 40 0
5 111 2 JeansA 44 1
6 111 2 JeansB 44 1
7 222 1 ShirtA 36 1
8 222 1 ShirtB 36 1
9 222 1 ShirtB 40 0
10 222 1 JeansA 40 0
11 222 2 JeansB 38 0
11 222 2 ShirtA 44 0
12 222 3 JeansB 36 0
结果:
customernumber ordernumber article size returned changed
1 111 1 JeansA 40 1 0
2 111 1 JeansA 42 1 0
3 111 1 ShirtA 40 0 0
4 111 2 JeansA 40 0 0
5 111 2 JeansA 44 1 1
6 111 2 JeansB 44 1 0
7 222 1 ShirtA 36 1 0
8 222 1 ShirtB 36 1 0
9 222 1 ShirtB 40 0 0
10 222 1 JeansA 40 0 0
11 222 2 JeansB 38 0 0
11 222 2 ShirtA 44 0 1
12 222 3 JeansB 36 0 0
很抱歉这个混乱,我实际上在我的例子中犯了一个错误并错误填写了更改的变量。如果你还在帮助我,我会非常感激。
谢谢!
答案 0 :(得分:2)
新答案:
library(data.table)
setDT(df)
df[, changed := 0
][df[df, on = .(customernumber, ordernumber < ordernumber, article), nomatch = 0
][size != i.size & returned == 1, .SD[!i.size %in% size], by = .(customernumber, ordernumber, article)
][, .(customernumber, ordernumber, article, size = i.size)][, unique(.SD)]
, on = .(customernumber, ordernumber, article, size), changed := 1][]
的可能解决方案:
customernumber ordernumber article size returned changed
1: 111 1 JeansA 40 1 0
2: 111 1 JeansA 42 1 0
3: 111 1 ShirtA 40 0 0
4: 111 2 JeansA 40 0 0
5: 111 2 JeansA 44 1 1
6: 111 2 JeansB 44 1 0
7: 222 1 ShirtA 36 1 0
8: 222 1 ShirtB 36 1 0
9: 222 1 ShirtB 40 0 0
10: 222 1 JeansA 40 0 0
11: 222 2 JeansB 38 0 0
12: 222 2 ShirtA 44 0 1
13: 222 3 JeansB 36 0 0
给出:
library(data.table) setDT(df) df[df[returned == 0][df[returned == 1] , on = .(customernumber, article) ][ordernumber != i.ordernumber] , on = .(customernumber, article, returned) , changed := i.returned ][, changed := replace(changed, is.na(changed), 0)][]
旧回答:
customernumber ordernumber article size returned changed
1: 111 1 JeansA 40 1 0
2: 111 1 JeansA 42 1 0
3: 111 1 ShirtA 40 0 0
4: 111 2 JeansA 42 0 1
5: 111 2 JeansB 44 1 0
6: 222 1 ShirtA 36 1 0
7: 222 1 ShirtB 36 1 0
8: 222 1 ShirtB 40 0 0
9: 222 1 JeansA 40 0 0
10: 222 2 JeansB 38 0 0
11: 222 2 ShirtA 44 0 1
12: 222 3 JeansB 36 0 0
给出:
.gallery-container { position: relative; } .gallery-img { display: block; width: 100%; height: auto; } .gallery-overlay { position: absolute; top: 0; bottom: 0; left: 0; right: 0; opacity: 0; transition: .5s ease; background-color: rgba(255,255,255,0.5); display: flex; align-items: center; justify-content: center; font-size: 50px; } .gallery-container:hover .gallery-overlay { opacity: 1; }
答案 1 :(得分:0)
您正在处理多个滞后条件,因此我们需要多个if ("your_component" != null){
//your code
}
命令来创建该条件。然后,我们可以使用lag
创建case_when
列。
changed