仅分离变量名后进行转置

时间:2017-12-23 14:59:56

标签: r transpose

我是R的新手,但我沉迷于掌握!我正在做一个工作项目,我完全被难住了!非常感谢任何帮助!

我需要转换此数据框...

   Brand       UK__Sales__YA   UK__Sales__MAT  CN__Sales__YA  CN__Sales__MAT
1  Snickers    100             110            90             95
2  Twix        50              60             30             35
3  Skittles    75              80             105            130

...到这个

   Brand       Country     Year      Sales
1  Snickers    UK          YA        100
2  Snickers    UK          MAT       110
3  Snickers    CN          YA        90
4  Snickers    CN          MAT       95
5  Twix        UK          YA        50
6  Twix        UK          MAT       60
7  Twix        CN          YA        30
8  Twix        CN          MAT       35
9  Skittles    UK          YA        75
10 Skittles    UK          MAT       80
11 Skittles    CN          YA        105
12 Skittles    CN          MAT       130

正如你所知,我需要中断第一部分&我的Sales变量的最后一部分,并将它们创建为单独的数据堆栈。我有其他国家&我的数据集中的其他指标,但我想如果你能帮助我解决这个问题,那么我可以完成它。谢谢!! : - )

4 个答案:

答案 0 :(得分:2)

查看tidyr软件包 - 事实上,tidyverse中的所有软件包都有助于此类数据修改工作:

library(tidyr)
library(dplyr)

df %>%
  gather(key, Sales, -Brand) %>%
  separate(key, c("Country", "delete", "Year"), sep = "__") %>%
  select(-delete) %>%
  arrange(Brand)

#       Brand Country Year Sales
# 1  Skittles      UK   YA    75
# 2  Skittles      UK  MAT    80
# 3  Skittles      CN   YA   105
# 4  Skittles      CN  MAT   130
# 5  Snickers      UK   YA   100
# 6  Snickers      UK  MAT   110
# 7  Snickers      CN   YA    90
# 8  Snickers      CN  MAT    95
# 9      Twix      UK   YA    50
# 10     Twix      UK  MAT    60
# 11     Twix      CN   YA    30
# 12     Twix      CN  MAT    35

要了解正在进行的操作,请分别运行每个管道%>%语句:(例如,查看df %>% gather(key, Sales, -Brand)之后的输出以查看其功能)。接下来通过separate管道运行转换。

答案 1 :(得分:2)

1)dplyr / tidyr 使用最后注释中可重复显示的数据从宽到长的形式收集数据框,然后分离出新列的各个部分。使用“值”列将新的“变量”列展开到“价格”和“销售”中作为其中的值,然后进行排序。如果顺序无关紧要,可以省略最后一行代码。

library(dplyr)
library(tidyr)

DF %>% 
  gather(new, Value, -Brand) %>%
  separate(new, c("Country", "Variable", "Year"), sep = "__") %>%
  spread(Variable, Value) %>%
  arrange(Brand, desc(Country), desc(Year))

,并提供:

      Brand Country Year Sales
1  Skittles      UK   YA    75
2  Skittles      UK  MAT    80
3  Skittles      CN   YA   105
4  Skittles      CN  MAT   130
5  Snickers      UK   YA   100
6  Snickers      UK  MAT   110
7  Snickers      CN   YA    90
8  Snickers      CN  MAT    95
9      Twix      UK   YA    50
10     Twix      UK  MAT    60
11     Twix      CN   YA    30
12     Twix      CN  MAT    35

请注意,上述内容也适用于DF2,也在下面的注释中定义。

1a)这个略短的替代方案也可以使用DF,而不是DF2。如果订单无关紧要,可以省略arrange行。

DF %>% 
  gather(new, Sales, -Brand) %>%
  separate(new, c("Country", "Year"), sep = "__Sales__") %>%
  arrange(Brand, desc(Country), desc(Year))

2)此备选方案不涉及使用reshape从宽到长形式重新整形的包。如果行名称和顺序无关紧要,可以省略rownames(long) <- NULL语句之后的所有内容。此代码也适用于DF2

varying <- split(names(DF)[-1], sub(".*__(.*)__.*", "\\1", names(DF)[-1]))
long <- reshape(DF, dir = "long", idvar = "Brand", varying = varying, 
   v.names = names(varying))
out <- transform(long, Country = sub("__.*", "", time), Year = sub(".*__", "", time), 
   time = NULL)
rownames(out) <- NULL
o <- with(out, order(Brand, -xtfrm(Country), -xtfrm(Year)))
out <- out[o, ]
out

,并提供:

      Brand Sales Country Year
3  Skittles    75      UK   YA
6  Skittles    80      UK  MAT
9  Skittles   105      CN   YA
12 Skittles   130      CN  MAT
1  Snickers   100      UK   YA
4  Snickers   110      UK  MAT
7  Snickers    90      CN   YA
10 Snickers    95      CN  MAT
2      Twix    50      UK   YA
5      Twix    60      UK  MAT
8      Twix    30      CN   YA
11     Twix    35      CN  MAT

注意

Lines <- "   Brand       UK__Sales__YA   UK__Sales__MAT  CN__Sales__YA  CN__Sales__MAT
1  Snickers    100             110            90             95
2  Twix        50              60             30             35
3  Skittles    75              80             105            130"

DF <- read.table(text = Lines)

# same as DF but with additional columns for Price
DF2 <- cbind(DF, setNames(10 * DF[2:5], sub("Sales", "Price", names(DF)[2:5])))

答案 2 :(得分:0)

以下是tidyverse的一个选项。我们gather进入&#39; long&#39;格式,然后extract&#39; Var&#39;列进入&#39;国家&#39;和&#39;年&#39;

library(tidyr)
library(dplyr)
gather(df1, Var, Sales, -Brand) %>%
    extract(Var, into = c("Country", "Year"), "(\\w+)__\\w+__(\\w+)")
#      Brand Country Year Sales
#1  Snickers      UK   YA   100
#2      Twix      UK   YA    50
#3  Skittles      UK   YA    75
#4  Snickers      UK  MAT   110
#5      Twix      UK  MAT    60
#6  Skittles      UK  MAT    80
#7  Snickers      CN   YA    90
#8      Twix      CN   YA    30
#9  Skittles      CN   YA   105
#10 Snickers      CN  MAT    95
#11     Twix      CN  MAT    35
#12 Skittles      CN  MAT   130

data.table的相应选项是

library(data.table)
melt(setDT(df1), id.var = "Brand", value.names = "Sales")[, 
 c("Country", "Year") := tstrsplit(variable, "__")[-2]][, variable := NULL][]

答案 3 :(得分:0)

这是使用包reshape2的解决方案。

new <- reshape2::melt(data, id.vars = "Brand")
new$Country <- sub("(^[^_]*)_.*$", "\\1", new$variable)
new$Year <- sub("^.*_([[:alpha:]]*$)", "\\1", new$variable)
new <- new[, c(1, 4, 5, 3)]
names(new)[4] <- "Sales"

head(new)
#     Brand Country Year Sales
#1 Snickers      UK   YA   100
#2     Twix      UK   YA    50
#3 Skittles      UK   YA    75
#4 Snickers      UK  MAT   110
#5     Twix      UK  MAT    60
#6 Skittles      UK  MAT    80

数据

data <-
structure(list(Brand = c("Snickers", "Twix", "Skittles"), UK__Sales__YA = c(100L, 
50L, 75L), UK__Sales__MAT = c(110L, 60L, 80L), CN__Sales__YA = c(90L, 
30L, 105L), CN__Sales__MAT = c(95L, 35L, 130L)), .Names = c("Brand", 
"UK__Sales__YA", "UK__Sales__MAT", "CN__Sales__YA", "CN__Sales__MAT"
), class = "data.frame", row.names = c("1", "2", "3"))