在R中列出重新格式化

时间:2016-07-15 18:29:30

标签: r list dataframe formatting

我有这个df:

%

并希望将其格式化为两列数据框,每个 KEGGnumber Cor Colors X1 C00095 -2.623973e-01 RED X2 C17714, C00044 -2.241113e-01 RED X3 C00033 -3.066684e-01 RED 个与KEGGnumber匹配。它看起来像这样:

Color

基本上,新数据框会将旧数据框的行与多个KEGGnumber Colors C00095 RED C17714 RED C00044 RED C00033 RED 分开并将它们分开,同时为每个数据框保持相同的KEGGnumber

2 个答案:

答案 0 :(得分:1)

这可能会或可能不会重复,但可以在此处找到一个非常相似的问题:Splitting a string into new rows in R

将此示例简单地改编为您的案例:

library(splitstackshape)
library(data.table)
df2 <- as.data.frame(cSplit(as.data.frame(ls), "KEGGnumber",
                                     sep = ",", direction = "long"))

df2
  KEGGnumber        Cor Colors
1     c00095 -0.2623973    RED
2     c17714 -0.2241113    RED
3     c00044 -0.2241113    RED
4     c00033 -0.3066684    RED

答案 1 :(得分:1)

tidyr让这很容易:

library(tidyr)

df %>% separate_rows(KEGGnumber)
##          Cor Colors KEGGnumber
## 1 -0.2623973    RED     C00095
## 2 -0.2241113    RED     C17714
## 3 -0.2241113    RED     C00044
## 4 -0.3066684    RED     C00033

如果您愿意,请摘下Cor列。

不太漂亮的基本选项:

do.call(rbind, 
        Map(function(x, y){data.frame(KEGGnumber = x, Colors = y)}, 
            strsplit(as.character(df$KEGGnumber), ', '), 
            df$Colors))
##   KEGGnumber Colors
## 1     C00095    RED
## 2     C17714    RED
## 3     C00044    RED
## 4     C00033    RED