我是R的新手,但我沉迷于掌握!我正在做一个工作项目,我完全被难住了!非常感谢任何帮助!
我需要转换此数据框...
Brand UK__Sales__YA UK__Sales__MAT CN__Sales__YA CN__Sales__MAT
1 Snickers 100 110 90 95
2 Twix 50 60 30 35
3 Skittles 75 80 105 130
...到这个
Brand Country Year Sales
1 Snickers UK YA 100
2 Snickers UK MAT 110
3 Snickers CN YA 90
4 Snickers CN MAT 95
5 Twix UK YA 50
6 Twix UK MAT 60
7 Twix CN YA 30
8 Twix CN MAT 35
9 Skittles UK YA 75
10 Skittles UK MAT 80
11 Skittles CN YA 105
12 Skittles CN MAT 130
正如你所知,我需要中断第一部分&我的Sales变量的最后一部分,并将它们创建为单独的数据堆栈。我有其他国家&我的数据集中的其他指标,但我想如果你能帮助我解决这个问题,那么我可以完成它。谢谢!! : - )
答案 0 :(得分:2)
查看tidyr
软件包 - 事实上,tidyverse
中的所有软件包都有助于此类数据修改工作:
library(tidyr)
library(dplyr)
df %>%
gather(key, Sales, -Brand) %>%
separate(key, c("Country", "delete", "Year"), sep = "__") %>%
select(-delete) %>%
arrange(Brand)
# Brand Country Year Sales
# 1 Skittles UK YA 75
# 2 Skittles UK MAT 80
# 3 Skittles CN YA 105
# 4 Skittles CN MAT 130
# 5 Snickers UK YA 100
# 6 Snickers UK MAT 110
# 7 Snickers CN YA 90
# 8 Snickers CN MAT 95
# 9 Twix UK YA 50
# 10 Twix UK MAT 60
# 11 Twix CN YA 30
# 12 Twix CN MAT 35
要了解正在进行的操作,请分别运行每个管道%>%
语句:(例如,查看df %>% gather(key, Sales, -Brand)
之后的输出以查看其功能)。接下来通过separate
管道运行转换。
答案 1 :(得分:2)
1)dplyr / tidyr 使用最后注释中可重复显示的数据从宽到长的形式收集数据框,然后分离出新列的各个部分。使用“值”列将新的“变量”列展开到“价格”和“销售”中作为其中的值,然后进行排序。如果顺序无关紧要,可以省略最后一行代码。
library(dplyr)
library(tidyr)
DF %>%
gather(new, Value, -Brand) %>%
separate(new, c("Country", "Variable", "Year"), sep = "__") %>%
spread(Variable, Value) %>%
arrange(Brand, desc(Country), desc(Year))
,并提供:
Brand Country Year Sales
1 Skittles UK YA 75
2 Skittles UK MAT 80
3 Skittles CN YA 105
4 Skittles CN MAT 130
5 Snickers UK YA 100
6 Snickers UK MAT 110
7 Snickers CN YA 90
8 Snickers CN MAT 95
9 Twix UK YA 50
10 Twix UK MAT 60
11 Twix CN YA 30
12 Twix CN MAT 35
请注意,上述内容也适用于DF2
,也在下面的注释中定义。
1a)这个略短的替代方案也可以使用DF
,而不是DF2
。如果订单无关紧要,可以省略arrange
行。
DF %>%
gather(new, Sales, -Brand) %>%
separate(new, c("Country", "Year"), sep = "__Sales__") %>%
arrange(Brand, desc(Country), desc(Year))
2)此备选方案不涉及使用reshape
从宽到长形式重新整形的包。如果行名称和顺序无关紧要,可以省略rownames(long) <- NULL
语句之后的所有内容。此代码也适用于DF2
。
varying <- split(names(DF)[-1], sub(".*__(.*)__.*", "\\1", names(DF)[-1]))
long <- reshape(DF, dir = "long", idvar = "Brand", varying = varying,
v.names = names(varying))
out <- transform(long, Country = sub("__.*", "", time), Year = sub(".*__", "", time),
time = NULL)
rownames(out) <- NULL
o <- with(out, order(Brand, -xtfrm(Country), -xtfrm(Year)))
out <- out[o, ]
out
,并提供:
Brand Sales Country Year
3 Skittles 75 UK YA
6 Skittles 80 UK MAT
9 Skittles 105 CN YA
12 Skittles 130 CN MAT
1 Snickers 100 UK YA
4 Snickers 110 UK MAT
7 Snickers 90 CN YA
10 Snickers 95 CN MAT
2 Twix 50 UK YA
5 Twix 60 UK MAT
8 Twix 30 CN YA
11 Twix 35 CN MAT
Lines <- " Brand UK__Sales__YA UK__Sales__MAT CN__Sales__YA CN__Sales__MAT
1 Snickers 100 110 90 95
2 Twix 50 60 30 35
3 Skittles 75 80 105 130"
DF <- read.table(text = Lines)
# same as DF but with additional columns for Price
DF2 <- cbind(DF, setNames(10 * DF[2:5], sub("Sales", "Price", names(DF)[2:5])))
答案 2 :(得分:0)
以下是tidyverse
的一个选项。我们gather
进入&#39; long&#39;格式,然后extract
&#39; Var&#39;列进入&#39;国家&#39;和&#39;年&#39;
library(tidyr)
library(dplyr)
gather(df1, Var, Sales, -Brand) %>%
extract(Var, into = c("Country", "Year"), "(\\w+)__\\w+__(\\w+)")
# Brand Country Year Sales
#1 Snickers UK YA 100
#2 Twix UK YA 50
#3 Skittles UK YA 75
#4 Snickers UK MAT 110
#5 Twix UK MAT 60
#6 Skittles UK MAT 80
#7 Snickers CN YA 90
#8 Twix CN YA 30
#9 Skittles CN YA 105
#10 Snickers CN MAT 95
#11 Twix CN MAT 35
#12 Skittles CN MAT 130
data.table
的相应选项是
library(data.table)
melt(setDT(df1), id.var = "Brand", value.names = "Sales")[,
c("Country", "Year") := tstrsplit(variable, "__")[-2]][, variable := NULL][]
答案 3 :(得分:0)
这是使用包reshape2
的解决方案。
new <- reshape2::melt(data, id.vars = "Brand")
new$Country <- sub("(^[^_]*)_.*$", "\\1", new$variable)
new$Year <- sub("^.*_([[:alpha:]]*$)", "\\1", new$variable)
new <- new[, c(1, 4, 5, 3)]
names(new)[4] <- "Sales"
head(new)
# Brand Country Year Sales
#1 Snickers UK YA 100
#2 Twix UK YA 50
#3 Skittles UK YA 75
#4 Snickers UK MAT 110
#5 Twix UK MAT 60
#6 Skittles UK MAT 80
数据
data <-
structure(list(Brand = c("Snickers", "Twix", "Skittles"), UK__Sales__YA = c(100L,
50L, 75L), UK__Sales__MAT = c(110L, 60L, 80L), CN__Sales__YA = c(90L,
30L, 105L), CN__Sales__MAT = c(95L, 35L, 130L)), .Names = c("Brand",
"UK__Sales__YA", "UK__Sales__MAT", "CN__Sales__YA", "CN__Sales__MAT"
), class = "data.frame", row.names = c("1", "2", "3"))