R:将动态列名添加到R中的行

时间:2016-12-13 10:00:18

标签: r data.table dplyr

示例数据:

df <- data.frame(ProdCode = c("C1","C2"), ProdName = c("Product 1", "Product 2"), Category = c("Categ 1", "Categ 2"), "Jan-16" = c(3,2), "Apr-16" = c(3,""), "Jul-16" = c(5,2), "Oct-16" = c(5,2))

每个月对应的值和产品单元是该产品的评级。这需要在输出数据帧的Rating列中:

> df
  ProdCode  ProdName Category Jan.16 Apr.16 Jul.16 Oct.16
1       C1 Product 1  Categ 1      3      3      5      5
2       C2 Product 2  Categ 2      2             2      2

我的数据高于以下格式:

ProdCode  Product.Name  Category    Rating.Date   Rating
C1          Product 1   Categ 1     Jan-16         3
C1          Product 1   Categ 1     Apr-16         3
C1          Product 1   Categ 1     Jul-16         5
C1          Product 1   Categ 1     Oct-16         5
C2          Product 2   Categ 2     Jan-16         2
C2          Product 2   Categ 2     Jul-16         2
C2          Product 2   Categ 2     Oct-16         2

月份列是动态的,它将会增加,例如将来会按照2017年1月的产品等来增加。我可以通过使用for循环但不值得使用R来实现。

2 个答案:

答案 0 :(得分:0)

我们可以使用gather将其重新整形为'long',filter消除'Rating'中的空白,将arrange重塑为'ProdCode'

library(tidyr)
library(dplyr)
gather(df, Rating.Date, Rating, Jan.16:Oct.16) %>%
            filter(Rating !="") %>%
            arrange(ProdCode) 
#  ProdCode  ProdName Category Rating.Date Rating
#1       C1 Product 1  Categ 1      Jan.16      3
#2       C1 Product 1  Categ 1      Apr.16      3
#3       C1 Product 1  Categ 1      Jul.16      5
#4       C1 Product 1  Categ 1      Oct.16      5
#5       C2 Product 2  Categ 2      Jan.16      2
#6       C2 Product 2  Categ 2      Jul.16      2
#7       C2 Product 2  Categ 2      Oct.16      2

答案 1 :(得分:0)

这些是我们拥有的其他不同选择:

注意:您可以自由命名最后两列,我将其保留为默认值(变量和值)

使用melt() from data.frame

library(reshape2)
df1 = melt(df, id.vars = c("ProdCode" , "ProdName" ,"Category"), measure.vars = 4:7)
df1 = df1[df1$value != "",]

使用melt from data.table)

library(data.table)
setDT(df) 
melt.data.table(df, 1:3,4:7)[value!=""]  # '1:3, 4:7' are the column indexes. read more from ?melt

#  ProdCode  ProdName Category variable value
#1       C1 Product 1  Categ 1   Jan.16     3
#2       C2 Product 2  Categ 2   Jan.16     2
#3       C1 Product 1  Categ 1   Apr.16     3
#5       C1 Product 1  Categ 1   Jul.16     5
#6       C2 Product 2  Categ 2   Jul.16     2
#7       C1 Product 1  Categ 1   Oct.16     5
#8       C2 Product 2  Categ 2   Oct.16     2

# if you want your specific column names :

melt.data.table(df, 1:3,4:7, variable.name = "Rating.Date", value.name = "Rating")[Rating!=""]
#   ProdCode  ProdName Category Rating.Date Rating
#1:       C1 Product 1  Categ 1      Jan.16      3
#2:       C2 Product 2  Categ 2      Jan.16      2
#3:       C1 Product 1  Categ 1      Apr.16      3
#4:       C1 Product 1  Categ 1      Jul.16      5
#5:       C2 Product 2  Categ 2      Jul.16      2
#6:       C1 Product 1  Categ 1      Oct.16      5
#7:       C2 Product 2  Categ 2      Oct.16      2