逗号分隔值,不同行

时间:2019-05-01 19:44:57

标签: r split dplyr row tidyverse

我有这样的数据。

structure(list(structureId = c("1JDN", "1DP4", "1XS5", "1SW1", 
"1P99", "1IXH"), structureTitle = c("Crystal Structure of Hormone Receptor", 
"DIMERIZED HORMONE BINDING DOMAIN OF THE ATRIAL NATRIURETIC PEPTIDE RECEPTOR", 
"The Crystal Structure of Lipoprotein Tp32 from Treponema pallidum", 
"Crystal structure of ProX from Archeoglobus fulgidus in complex with proline betaine", 
"1.7A crystal structure of protein PG110 from Staphylococcus aureus", 
"PHOSPHATE-BINDING PROTEIN (PBP) COMPLEXED WITH PHOSPHATE"), 
    chainId = c("A", "A", "A", "A", "A", "A"), ligandId = c("BMA,CL,FUC,MAN,NAG,NDG", 
    "CL,NAG,SO4", "MET", "MSE,PBE,ZN", "GLY,MET", "PO4"), ligandName = c("BETA-D-MANNOSE,CHLORIDE ION,ALPHA-L-FUCOSE,ALPHA-D-MANNOSE,N-ACETYL-D-GLUCOSAMINE,2-(ACETYLAMINO)-2-DEOXY-A-D-GLUCOPYRANOSE", 
    "CHLORIDE ION,N-ACETYL-D-GLUCOSAMINE,SULFATE ION", "METHIONINE", 
    "SELENOMETHIONINE,1,1-DIMETHYL-PROLINIUM,ZINC ION", "GLYCINE,METHIONINE", 
    "PHOSPHATE ION")), row.names = c(NA, -6L), class = c("tbl_df", 
"tbl", "data.frame"))

我想将ligandIdligandName的值分成不同的行。我的意思是每行只有1 ligandIdligandName

我尝试使用separate_rows,但在我的两列中都无法很好地处理。

df %>% separate_rows(ligandId, ligandName, sep = ",")

但是我遇到了这个错误:

> df %>% separate_rows(ligandId, ligandName, sep = ",")
Error: All nested columns must have the same number of elements.
Call `rlang::last_error()` to see a backtrace
> rlang::last_error()
<error>
message: All nested columns must have the same number of elements.
class:   `rlang_error`
backtrace:
  1. tidyr::separate_rows(., ligandId, ligandName, sep = ",")
 10. tidyr:::unnest.data.frame(data, !!!syms(vars), .drop = FALSE)
 12. tidyr::separate_rows(., ligandId, ligandName, sep = ",")
Call `rlang::last_trace()` to see the full backtrace

此外,我尝试了此操作:Split comma-separated strings in a column into separate rows,但未成功。

最后,我想要这样的东西:

1JDN   A   BMA   BETA-D-MANNOSE
1JDN   A   CL    CHLORIDE ION
1JDN   A   FUC   ALPHA-L-FUCOSE
1JDN   A   MAN   ALPHA-D-MANNOSE
1JDN   A   NAG   N-ACETYL-D-GLUCOSAMINE
1JDN   A   NDG   2-(ACETYLAMINO)-2-DEOXY-A-D-GLUCOPYRANOSE
...

1 个答案:

答案 0 :(得分:3)

我们可以使用separate_rows

library(tidyverse)
df1 %>% 
    separate_rows(ligandId, sep=",")

由于“ ligandId”,“ ligandName”每一行的字数不相同,一种选择是将gather转换为“长”格式,然后在“ val”列,最后separate_rows将其返回到“宽”

spread

对于单词数不同的多列,请使用df1 %>% gather(key, val, ligandId, ligandName) %>% separate_rows(val, sep=",") %>% group_by(structureId, key) %>% mutate(rn = row_number()) %>% spread(key, val) %>% select(-rn) # A tibble: 17 x 5 # Groups: structureId [6] # structureId structureTitle chainId ligandId ligandName # <chr> <chr> <chr> <chr> <chr> # 1 1DP4 DIMERIZED HORMONE BINDING DOMAIN OF THE ATRIAL NATRI… A CL CHLORIDE ION # 2 1DP4 DIMERIZED HORMONE BINDING DOMAIN OF THE ATRIAL NATRI… A NAG N-ACETYL-D-GLUCOSAMINE # 3 1DP4 DIMERIZED HORMONE BINDING DOMAIN OF THE ATRIAL NATRI… A SO4 SULFATE ION # 4 1IXH PHOSPHATE-BINDING PROTEIN (PBP) COMPLEXED WITH PHOSP… A PO4 PHOSPHATE ION # 5 1JDN Crystal Structure of Hormone Receptor A BMA BETA-D-MANNOSE # 6 1JDN Crystal Structure of Hormone Receptor A CL CHLORIDE ION # 7 1JDN Crystal Structure of Hormone Receptor A FUC ALPHA-L-FUCOSE # 8 1JDN Crystal Structure of Hormone Receptor A MAN ALPHA-D-MANNOSE # 9 1JDN Crystal Structure of Hormone Receptor A NAG N-ACETYL-D-GLUCOSAMINE #10 1JDN Crystal Structure of Hormone Receptor A NDG 2-(ACETYLAMINO)-2-DEOXY-A… #11 1P99 1.7A crystal structure of protein PG110 from Staphyl… A GLY GLYCINE #12 1P99 1.7A crystal structure of protein PG110 from Staphyl… A MET METHIONINE #13 1SW1 Crystal structure of ProX from Archeoglobus fulgidus… A MSE SELENOMETHIONINE #14 1SW1 Crystal structure of ProX from Archeoglobus fulgidus… A PBE 1 #15 1SW1 Crystal structure of ProX from Archeoglobus fulgidus… A ZN 1-DIMETHYL-PROLINIUM #16 1SW1 Crystal structure of ProX from Archeoglobus fulgidus… A <NA> ZINC ION #17 1XS5 The Crystal Structure of Lipoprotein Tp32 from Trepo… A MET METHIONINE

cSplit