如何在R中重塑csv表?

时间:2015-10-27 02:23:19

标签: r reshape

我有这个数据集:

   Group Group Group Cat Cat Cat  Betw
1      a               A          5.87
2      b           j   A          0.11
3      c               B       A  2.18
4      d               C   D      5.31
5      e               E   C      0.00
6      f               E         352.10
7      g               E          0.35
8      h               A   B      0.00
9      i     m         F          0.00
10     j               A   D      15.04

我想重塑一下,以便只有3列:Var1(可能是' Group'或者#39; Cat'),Var2(可能是小写或者大写字母)和Betw。

因此,例如,c,b和A都具有2.1892749的值,

   Var1 Var2  Betw
1 Group    a  5.87
2   Cat    A  5.87
3 Group    b  0.11
4 Group    j  0.11
5   Cat    A  0.11
...

如何使用R?

执行此操作

3 个答案:

答案 0 :(得分:2)

我们也可以使用data.table。我们将'data.frame'转换为'data.table'(setDT(dat),使用melt重新整形为长格式,删除'Var2'为空的行,并删除'Var1'中的子字符串'以.开头到字符串的末尾(如果存在)。

library(data.table)#v1.9.6+
melt(setDT(dat), id.var='Betw', variable.name='Var1', 
        value.name='Var2')[Var2!=''][, Var1:= sub('\\..*', '', Var1)][]

答案 1 :(得分:1)

我想直接应用melt并不适合您,因为数据框中的列名重复。所以沿着@akrun的路线,您可以使用类似这样的东西

tmp <- data.frame(df, check.names=T)
tmp <- melt(tmp, id="Betw", variable.name="Var1", value.name="Var2")
tmp$Var1 <- gsub("(.*)\\.[0-9]", "\\1", tmp$Var1)
df <- subset(tmp, Var2!="")

我使用的数据框

df <- data.frame(Group=c("a","b","c","d","e","f","g","h","i","j"),
                 Group=c("","","","","","","","","m",""),
                 Group=c("","j","","","","","","","",""),
                 Cat=c("A","A","B","C","E","E","E","A","F","A"),
                 Cat=c("","","","D","C","","","B","","D"),
                 Cat=c("","","A","","","","","","",""),
                 Betw=c(5.87,0.11,2.18,5.31,0,352.1,0.35,0,0,15.04),
                 check.names = F)

答案 2 :(得分:0)

您可以使用dplyrtidyr。首先我们gather到长数据,然后删除放在列上的额外数字,然后我们删除空白:

library(dplyr)
library(tidyr)
dat %>% gather(Var1, Var2, -Betw) %>%
        mutate(Var1 = gsub(".[0-9]$", "", Var1)) %>%
        filter(Var2 != "") 

使用的数据:

structure(list(Group = structure(1:10, .Label = c("a", "b", "c", 
"d", "e", "f", "g", "h", "i", "j"), class = "factor"), Group.1 = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L), .Label = c("", "m"), class = "factor"), 
    Group.2 = structure(c(1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L), .Label = c("", "j"), class = "factor"), Cat = structure(c(1L, 
    1L, 2L, 3L, 4L, 4L, 4L, 1L, 5L, 1L), .Label = c("A", "B", 
    "C", "E", "F"), class = "factor"), Cat.1 = structure(c(1L, 
    1L, 1L, 4L, 3L, 1L, 1L, 2L, 1L, 4L), .Label = c("", "B", 
    "C", "D"), class = "factor"), Cat.2 = structure(c(1L, 1L, 
    2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("", "A"), class = "factor"), 
    Betw = c(5.87, 0.11, 2.18, 5.31, 0, 352.1, 0.35, 0, 0, 15.04
    )), .Names = c("Group", "Group.1", "Group.2", "Cat", "Cat.1", 
"Cat.2", "Betw"), class = "data.frame", row.names = c("1", "2", 
"3", "4", "5", "6", "7", "8", "9", "10"))