如何使列名称成为一行并将字符串拆分成多行

时间:2018-07-05 19:56:23

标签: r

我不确定该如何写问题标题,所以我尽力了。我将举例说明我的数据集。我们可以将数据集称为my_data

tibble::tribble(
  ~Pathway, ~log_value, ~ratio, ~z_score,                ~molecules,
     "GHR",      "N/A",  "N/A",    "N/A", "CD40LG,TGFBR1,MYH9,MMP1",
    "TGFB",      "N/A",  "N/A",    "N/A", "ADAMTS8,PIK3R1,HRAS,SEM",
     "PKA",      "N/A",  "N/A",    "N/A", "PIK3CA,PDGFA,PIK3R1,SPH",
     "PKB",      "N/A",  "N/A",    "N/A", "MAST2,PIK3CA,TGFBR1,BAD",
     "PKC",      "N/A",  "N/A",    "N/A", "TGFBR1,AKAP9,CAMK2A,PHK"
  )

所以我要做的是将第1列分成一行,并将其作为每一行的名称。我也想将第5列分成多行。这就是我的设想。

GHR TGFB PKA PKB PKC
CD40LG ADAMTS8 PIK3CA MAST2 TGFBR1
TGFBR1 PIK3R1 PDGFA PIK3CA AKAP9
MYH9 HRAS PIK3R1 TGFBR1 CAMK2A
MMP1 SEM SPH BAD PHK

所以我真的不需要第2、3或4列,因此我使用my_data <- my_data[c(1,5)]删除了它们,而我通过使用my_data$molecules <- as.character(gsub(","," ",my_data$molecules))删除了名称之间的逗号,这给了我问题,但也许您不需要使用它。因此,我只想使第1列成为行名,然后将第5列分成多行,但是我为此很努力。有人有建议吗?提前致谢。

3 个答案:

答案 0 :(得分:1)

您可能会使用它-

df = df[, c(1, 5)]

## Split on comma and add to dataframe
tmp = strsplit(df$molecules, ",")
df = cbind(df[, -2], do.call(rbind, tmp))

## Transpose the dataframe
df = t(df)
rownames(df) = NULL

答案 1 :(得分:1)

您的数据,已解析

df <- tibble::tribble(
      ~Pathway, ~log_value, ~ratio, ~z_score,                ~molecules,
         "GHR",      "N/A",  "N/A",    "N/A", "CD40LG,TGFBR1,MYH9,MMP1",
        "TGFB",      "N/A",  "N/A",    "N/A", "ADAMTS8,PIK3R1,HRAS,SEM",
         "PKA",      "N/A",  "N/A",    "N/A", "PIK3CA,PDGFA,PIK3R1,SPH",
         "PKB",      "N/A",  "N/A",    "N/A", "MAST2,PIK3CA,TGFBR1,BAD",
         "PKC",      "N/A",  "N/A",    "N/A", "TGFBR1,AKAP9,CAMK2A,PHK"
      )

这是dplyrtidyr的解决方案

df %>% select(Pathway, molecules) %>% 
  separate_rows(molecules,sep=",") %>% 
  group_by(Pathway) %>% 
  mutate(id=1:n()) %>% 
  spread(key="Pathway", value="molecules") %>% 
  select(-id)

#> # A tibble: 4 x 5
#>   GHR    PKA    PKB    PKC    TGFB   
#>   <chr>  <chr>  <chr>  <chr>  <chr>  
#> 1 CD40LG PIK3CA MAST2  TGFBR1 ADAMTS8
#> 2 TGFBR1 PDGFA  PIK3CA AKAP9  PIK3R1 
#> 3 MYH9   PIK3R1 TGFBR1 CAMK2A HRAS   
#> 4 MMP1   SPH    BAD    PHK    SEM    

在这里,我们首先关注select列,然后用逗号分隔行。下一个任务是将数据从长格式重新广播到宽格式。为此,您将需要一个唯一的ID来匹配行。在您spread列之后,我可以删除id

答案 2 :(得分:1)

 dat=read.table(strings=F,text="Pathway log_value ratio z_score molecules
  GHR N/A N/A N/A CD40LG,TGFBR1,MYH9,MMP1…
            TGFB N/A N/A N/A ADAMTS8,PIK3R1,HRAS,SEM…
            PKA N/A N/A N/A PIK3CA,PDGFA,PIK3R1,SPH…
            PKB N/A N/A N/A MAST2,PIK3CA,TGFBR1,BAD…
            PKC N/A N/A N/A TGFBR1,AKAP9,CAMK2A,PHK…",na.string="N/A",h=T)


 a = data.frame(t(read.table(text=dat$molecules,sep=",")),stringsAsFactors = F)

 setNames(a,dat$Pathway)

      GHR    TGFB    PKA    PKB    PKC
V1 CD40LG ADAMTS8 PIK3CA  MAST2 TGFBR1
V2 TGFBR1  PIK3R1  PDGFA PIK3CA  AKAP9
V3   MYH9    HRAS PIK3R1 TGFBR1 CAMK2A
V4  MMP1…    SEM…   SPH…   BAD…   PHK…