我有一个R数据帧
(time
(let [q (seq "xxxxxxxxxxxxxxxxxxxx")]
(dotimes [_ 1000000]
(apply str q))))
"Elapsed time: 620.943971 msecs"
=> nil
(time
(let [q (seq "xxxxxxxxxxxxxxxxxxxx")]
(dotimes [_ 1000000]
(.s q))))
"Elapsed time: 1232.119319 msecs"
=> nil
(time
(let [^StringSeq q (seq "xxxxxxxxxxxxxxxxxxxx")]
(dotimes [_ 1000000]
(.s q))))
"Elapsed time: 3.339613 msecs"
=> nil
我想将它转换成这样,
Customer Month BaseVolume IncrementalVolume TradeSpend
10 Jan 11 1 110
10 Feb 12 2 120
20 Jan 21 7 210
20 Feb 22 8 220
我试过dcast(重塑)但我无法得到这个结果。请帮帮我
答案 0 :(得分:1)
您可以尝试的是以下内容(在您的情况下,您提到的数据是df1,您需要在我提及的任何操作之前执行setDT(df1)
):
library(data.table)
dt1 <- structure(list(Customer = c(10L, 10L, 20L, 20L), Month = c("Jan",
"Feb", "Jan", "Feb"), BaseVolume = c(11L, 12L, 21L, 22L), IncrementalVolume = c(1L,
2L, 7L, 8L), TradeSpend = c(110L, 120L, 210L, 220L)), .Names = c("Customer",
"Month", "BaseVolume", "IncrementalVolume", "TradeSpend"), row.names = c(NA,
-4L), class = c("data.table", "data.frame"))
res <- dcast(melt(dt1, id.vars = c("Customer", "Month")), Customer + variable~ Month)
> res
Customer variable Feb Jan
1: 10 BaseVolume 12 11
2: 10 IncrementalVolume 2 1
3: 10 TradeSpend 120 110
4: 20 BaseVolume 22 21
5: 20 IncrementalVolume 8 7
6: 20 TradeSpend 220 210
如果您想要它们,您可以执行以下操作:
update_cols <- which(!names(res) %in% c("Customer", "variable"))
res[, (update_cols):= lapply(.SD, function(x) paste(variable, x)), .SDcols = update_cols][, variable:= NULL]
给出了:
> res
Customer Feb Jan
1: 10 BaseVolume 12 BaseVolume 11
2: 10 IncrementalVolume 2 IncrementalVolume 1
3: 10 TradeSpend 120 TradeSpend 110
4: 20 BaseVolume 22 BaseVolume 21
5: 20 IncrementalVolume 8 IncrementalVolume 7
6: 20 TradeSpend 220 TradeSpend 210
答案 1 :(得分:1)
虽然已经有an answer,但我觉得它可以在某些方面得到改善,以接近预期的输出:
Jan
,Feb
dcast()
我们首先将输入数据从宽格式转换为长格式,但请确保Month
按正确的顺序显示:
molten <- melt(dt1, id.vars = c("Customer", "Month"))
# turn Month into factor with levels in the given order
molten[, Month := forcats::fct_inorder(Month)]
现在,在调用text
之前,以长格式创建了一个新的dcast()
列:
molten[, text := paste(variable, value)]
dcast(molten, Customer + variable ~ Month, value.var = "text")[, variable := NULL][]
# Customer Jan Feb
#1: 10 BaseVolume 11 BaseVolume 12
#2: 10 IncrementalVolume 1 IncrementalVolume 2
#3: 10 TradeSpend 110 TradeSpend 120
#4: 20 BaseVolume 21 BaseVolume 22
#5: 20 IncrementalVolume 7 IncrementalVolume 8
#6: 20 TradeSpend 210 TradeSpend 220
结果与this answer类似,但按预期顺序排列。
N.B。不幸的是,折叠的方法也是每Customer
行的行不起作用,因为打印时没有遵守换行符:
dcast(molten, Customer ~ Month, value.var = "text", paste0, collapse = "\n")
# Customer Jan Feb
#1: 10 BaseVolume 11\nIncrementalVolume 1\nTradeSpend 110 BaseVolume 12\nIncrementalVolume 2\nTradeSpend 120
#2: 20 BaseVolume 21\nIncrementalVolume 7\nTradeSpend 210 BaseVolume 22\nIncrementalVolume 8\nTradeSpend 220
text
列可以通过向右填充空格来保持对齐(最小长度由最长字符串的字符长度决定):
molten[, text := paste(variable, value)]
molten[, text := stringr::str_pad(text, max(nchar(text)), "right")]
dcast(molten, Customer + variable ~ Month, value.var = "text")[, variable := NULL][]
# Customer Jan Feb
#1: 10 BaseVolume 11 BaseVolume 12
#2: 10 IncrementalVolume 1 IncrementalVolume 2
#3: 10 TradeSpend 110 TradeSpend 120
#4: 20 BaseVolume 21 BaseVolume 22
#5: 20 IncrementalVolume 7 IncrementalVolume 8
#6: 20 TradeSpend 210 TradeSpend 220
或者,text
列可以自行对齐:
fmt <- stringr::str_interp("%-${n}s %3i", list(n = molten[, max(nchar(levels(variable)))]))
molten[, text := sprintf(fmt, variable, value)]
dcast(molten, Customer + variable ~ Month, value.var = "text")[, variable := NULL][]
# Customer Jan Feb
#1: 10 BaseVolume 11 BaseVolume 12
#2: 10 IncrementalVolume 1 IncrementalVolume 2
#3: 10 TradeSpend 110 TradeSpend 120
#4: 20 BaseVolume 21 BaseVolume 22
#5: 20 IncrementalVolume 7 IncrementalVolume 8
#6: 20 TradeSpend 210 TradeSpend 220
此处,sprintf()
中使用的格式也是使用字符串插值动态创建的:
fmt
#[1] "%-17s %3i"
请注意,此处使用variable
的最长级别的字符长度,因为melt()
已将variable
设置为默认值。
答案本来可以简单得多,因为data.table
的最新版本允许同时重塑多个列:
molten <- melt(dt1, id.vars = c("Customer", "Month"))
molten[, Month := forcats::fct_inorder(Month)]
dcast(molten, Customer + variable ~ Month, value.var = c("variable", "value"))
# Customer variable variable.1_Jan variable.1_Feb value_Jan value_Feb
#1: 10 BaseVolume BaseVolume BaseVolume 11 12
#2: 10 IncrementalVolume IncrementalVolume IncrementalVolume 1 2
#3: 10 TradeSpend TradeSpend TradeSpend 110 120
#4: 20 BaseVolume BaseVolume BaseVolume 21 22
#5: 20 IncrementalVolume IncrementalVolume IncrementalVolume 7 8
#6: 20 TradeSpend TradeSpend TradeSpend 210 220
但遗憾的是,它缺少按交替顺序轻松重新排序列的选项,即所有列属于Jan
,然后是Feb
等。