我有此数据,除Product_Code之外的所有变量都是重复的。我想创建新变量,例如:Prod_,Prod_2 ....而不是为新变量转置Product_Code并消除重复项。
ID DATE DAYS MONTH Product_Code
1 00003600B 2018-06-30 854 6 83648
2 00003600B 2018-06-30 854 6 40984
3 00003600B 2018-06-30 854 6 14534
4 00003600B 2018-06-30 854 6 18708
5 00003600B 2018-06-30 854 6 18710
我尝试了散布和转置功能,但没有用。
spread(data = Tickets, key = ID, value = Product_Code)
我也尝试过移调,但效果不佳
Tickets.t = t(Tickets)
关于如何执行此操作的任何想法?
我希望与此类似:
ID DATA DAYS MONTH PROD_1 PROD_2 PROD_3 PROD_4 PROD_5
00003600B 2018-06-30 854 6 83648 40984 14534 18708 18710
00003600B 2016-02-27 280 2 999195 999154 999339 0 0
00003600B 2015-05-23 77 5 999026 999339 999021 27640 999195
答案 0 :(得分:1)
在这里,我们需要一个序列列。按“ ID”,“ DATE”,“ DAYS”,“ MONTH”分组,通过将字符串“ PROD”与row_number()
串联来创建“ PROD”列,然后将其用于spread
“ Product_Code” '值
library(tidyverse)
Tickets %>%
group_by(ID, DATE, DAYS, MONTH) %>%
mutate(PROD = str_c("PROD_", row_number())) %>%
spread(PROD, Product_Code)
# A tibble: 1 x 9
# Groups: ID, DATE, DAYS, MONTH [1]
# ID DATE DAYS MONTH PROD_1 PROD_2 PROD_3 PROD_4 PROD_5
# <chr> <chr> <int> <int> <int> <int> <int> <int> <int>
#1 00003600B 2018-06-30 854 6 83648 40984 14534 18708 18710
Tickets <- structure(list(ID = c("00003600B", "00003600B", "00003600B",
"00003600B", "00003600B"), DATE = c("2018-06-30", "2018-06-30",
"2018-06-30", "2018-06-30", "2018-06-30"), DAYS = c(854L, 854L,
854L, 854L, 854L), MONTH = c(6L, 6L, 6L, 6L, 6L), Product_Code = c(83648L,
40984L, 14534L, 18708L, 18710L)), class = "data.frame", row.names = c("1",
"2", "3", "4", "5"))
答案 1 :(得分:1)
在使用点差之前,您需要添加一个与产品编号相对应的变量。
library(tidyverse)
Ticket %>%
group_by(ID, DATE, DAYS, MONTH) %>%
mutate(PROD = 1:n()) %>%
spread(key = PROD, value = Product_code)