我试图根据3列将数据拆分为3个部分,然后想要传播数据以便进一步处理。但是,当我使用2列分割时,代码可以正常工作。它不适用于3列。这是建立在How can I spread repeated measures of multiple variables into wide format?
的讨论基础之上的这是我的数据:
structure(list(Zone = c("East", "East", "East", "East", "East",
"East", "East", "West", "West", "West", "West", "West", "West",
"West"), Fiscal.Year = c(2016, 2016, 2016, 2016, 2016, 2016,
2017, 2016, 2016, 2016, 2017, 2017, 2018, 2018), Transaction.ID = c(132,
133, 134, 135, 136, 137, 171, 171, 172, 173, 175, 176, 177, 178
), L.Rev = c(3, 0, 0, 1, 0, 0, 2, 1, 1, 2, 2, 1, 2, 1), L.Qty = c(3,
0, 0, 1, 0, 0, 1, 1, 1, 2, 2, 1, 2, 1), A.Rev = c(0, 0, 0, 1,
1, 1, 0, 0, 0, 0, 0, 1, 0, 0), A.Qty = c(0, 0, 0, 2, 2, 3, 0,
0, 0, 0, 0, 3, 0, 0), I.Rev = c(4, 4, 4, 0, 1, 0, 3, 0, 0, 0,
1, 0, 1, 1), I.Qty = c(2, 2, 2, 0, 1, 0, 3, 0, 0, 0, 1, 0, 1,
1)), .Names = c("Zone", "Fiscal.Year", "Transaction.ID", "L.Rev",
"L.Qty", "A.Rev", "A.Qty", "I.Rev", "I.Qty"), row.names = c(NA,
14L), class = "data.frame")
以下是有效的代码:
Input_File %>%
gather(Rev_Qty,Value, L.Rev:I.Qty) %>%
separate(Rev_Qty, into=c("L.A","Rev.Qty")) %>%
split(.,list(.$Zone,.$Rev.Qty)) %>%
#Ideally, I want three-way split--i.e. Fiscal.Year, Zone and Rev.Qty
purrr::map(~unite(.,LAType.Rev.Qty, L.A, Rev.Qty, sep = ".")) %>%
purrr::map(~spread_(.,key_col = "LAType.Rev.Qty", value_col = "Value"))
这很好用 - 即。我得到一个长度为4的列表,我可以用它进行进一步处理。
但是,当我根据Rev.Qty
应用三向分割时,以下代码不起作用; Zone
和Fiscal.Year
。
Input_File %>%
gather(Rev_Qty,Value, L.Rev:I.Qty) %>%
separate(Rev_Qty, into=c("L.A","Rev.Qty")) %>%
#Now split the data based on zone, Rev vs. Qty and year--DOESN'T WORK
split(.,list(.$Zone,.$Rev.Qty,.$Fiscal.Year)) %>%
purrr::map(~unite(.,LAType.Rev.Qty, L.A, Rev.Qty, sep = ".")) %>%
purrr::map(~spread_(.,key_col = "LAType.Rev.Qty", value_col = "Value"))
我收到以下错误:
Error in enc2utf8(col_names(col_labels, sep = sep)) :
argumemt is not a character vector
经过调试,我发现代码执行良好,直到unite().
一旦我调用spread_()
它就会中断。
预期输出:如果我们运行代码直到unite()
,我们将看到我们将获得一个长度为12的列表。在应用传播之后,预期输出将是此列表LAType.Rev.Qty
和Value
列。我希望这能澄清预期的产出。
有人可以帮我解决一下这个问题吗?我是初学者,我不知道发生了什么。
答案 0 :(得分:2)
我们需要drop=TRUE
中的split
来删除数据集中不存在的组合
Input_File %>%
gather(Rev_Qty,Value, L.Rev:I.Qty) %>%
separate(Rev_Qty, into=c("L.A","Rev.Qty")) %>%
split(.,list(.$Zone,.$Rev.Qty,.$Fiscal.Year), drop = TRUE) %>%
purrr::map(~unite(.,LAType.Rev.Qty, L.A, Rev.Qty, sep = ".")) %>%
purrr::map(~spread_(.,key_col = "LAType.Rev.Qty", value_col = "Value"))