打开.mol文件并编译信息

时间:2019-01-28 12:53:29

标签: r rstudio

我正在尝试创建一个程序,该程序可以打开许多文件(.mol),并从这些文件中复制特定信息,然后将其保存到电子表格(TAB分隔文件'\ t')中。

我的计算机上有10000个mol文件,看起来像SN00000001 SN00000002 SN00000003 ... SN00010000。

(下载链接=> http://bioinf-applied.charite.de/supernatural_new/src/download_mol.php?sn_id=SN00000001

我有两个问题:

  1. 我已经尝试使用load.molecules(rcdk)函数和ChemmineR(loadsdf)函数,但是我没有成功在R中打开.mol文件。

  2. 是否可以打开每个.mol文件并将特定信息(例如“ ID”,“名称”,“分子式”)使用R保存到唯一的电子表格中?

2 个答案:

答案 0 :(得分:0)

我希望它能起作用,我只用2 mol文件进行了测试。我使用read.SDFset包中的ChemmineR来读取所有mol文件。我使用的软件包tidyverse用于处理小标题。标题实际上是具有一些额外属性/功能的数据框。

library(tidyverse)
library(ChemmineR)

# get the full path of your mol files
mol_files <- list.files(# specify your folder here in case of windows also add your drive letter e.g.: "c:/users/path/to/my/mol_files"
                        path = "/home/rico/r-stuff/temp",
                        pattern = "*mol",
                        full.names = TRUE)

# create tibble, with filenames (incl. the full path)
df <- tibble(filenames = mol_files)

# create function to extract all the information 
extract_info <- function(sdfset) {
  # function to extract information from a sdfset (ChemmineR)
  # this only works if there is one molecule in the sdfset

  ID <- sdfset@SDF[[1]]@datablock["SNID"]
  Name <- sdfset@SDF[[1]]@header["Molecule_Name"]
  Molecular_Formula <- sdfset@SDF[[1]]@datablock["Molecular Formula"]

  sdf_info <- tibble(SNID = ID,
                     Name = Name,
                     MolFormula = Molecular_Formula)

  return(sdf_info)
}

# read all files and extract info
df <- df %>% 
  mutate(sdf_data = map(.x = filenames,
                        .f = ~ read.SDFset(sdfstr = .x)),
         info = map(.x = sdf_data,
                    .f = ~ extract_info(sdfset = .x)))

# make a nice tibble with only the info you want
all_info <- df %>% 
  select(info) %>% 
  unnest(info)

# write to file
write_delim(x = all_info,
            path = file.path(getwd(), "temp", "test.tsv"),
            delim = "\t")

答案 1 :(得分:0)

好,我会给您发送验证码

# get the full path of your mol files
mol_files <- list.files(path = file.path(getwd(), "/Users/189919604/Desktop/Download 
SuperNatural II/SN00000001"), # specify your folder here
                    pattern = "*mol",
                    full.names = TRUE)

# create tibble, with filenames (incl. the full path)
df <- tibble(filenames = mol_files)

# create function to extract all the information 
extract_info <- function(sdfset) {
  # function to extract information from a sdfset (ChemmineR)
  # this only works if there is one molecule in the sdfset

  ID <- sdfset@SDF[[1]]@datablock["SNID"]
  Name <- sdfset@SDF[[1]]@header["Molecule_Name"]
  Molecular_Formula <- sdfset@SDF[[1]]@datablock["Molecular Formula"]

  sdf_info <- tibble(SNID = ID,
                 Name = Name,
                 MolFormula = Molecular_Formula)

  return(sdf_info)
}

# read all files and extract info
df <- df %>% 
  mutate(sdf_data = map(.x = filenames,
                        .f = ~ read.SDFset(sdfstr = .x)),
         info = map(.x = sdf_data,
                    .f = ~ extract_info(sdfset = .x)))

# make a nice tibble with only the info you want
all_info <- df %>% 
  select(molecule) %>% 
  unnest(info)

# write to file
write_delim(x = all_info,
            path = file.path(getwd(), "test.tsv"),
            delim = "\t")