我正在尝试创建一个程序,该程序可以打开许多文件(.mol),并从这些文件中复制特定信息,然后将其保存到电子表格(TAB分隔文件'\ t')中。
我的计算机上有10000个mol文件,看起来像SN00000001 SN00000002 SN00000003 ... SN00010000。
(下载链接=> http://bioinf-applied.charite.de/supernatural_new/src/download_mol.php?sn_id=SN00000001)
我有两个问题:
我已经尝试使用load.molecules(rcdk)函数和ChemmineR(loadsdf)函数,但是我没有成功在R中打开.mol文件。
是否可以打开每个.mol文件并将特定信息(例如“ ID”,“名称”,“分子式”)使用R保存到唯一的电子表格中?
答案 0 :(得分:0)
我希望它能起作用,我只用2 mol文件进行了测试。我使用read.SDFset
包中的ChemmineR
来读取所有mol文件。我使用的软件包tidyverse
用于处理小标题。标题实际上是具有一些额外属性/功能的数据框。
library(tidyverse)
library(ChemmineR)
# get the full path of your mol files
mol_files <- list.files(# specify your folder here in case of windows also add your drive letter e.g.: "c:/users/path/to/my/mol_files"
path = "/home/rico/r-stuff/temp",
pattern = "*mol",
full.names = TRUE)
# create tibble, with filenames (incl. the full path)
df <- tibble(filenames = mol_files)
# create function to extract all the information
extract_info <- function(sdfset) {
# function to extract information from a sdfset (ChemmineR)
# this only works if there is one molecule in the sdfset
ID <- sdfset@SDF[[1]]@datablock["SNID"]
Name <- sdfset@SDF[[1]]@header["Molecule_Name"]
Molecular_Formula <- sdfset@SDF[[1]]@datablock["Molecular Formula"]
sdf_info <- tibble(SNID = ID,
Name = Name,
MolFormula = Molecular_Formula)
return(sdf_info)
}
# read all files and extract info
df <- df %>%
mutate(sdf_data = map(.x = filenames,
.f = ~ read.SDFset(sdfstr = .x)),
info = map(.x = sdf_data,
.f = ~ extract_info(sdfset = .x)))
# make a nice tibble with only the info you want
all_info <- df %>%
select(info) %>%
unnest(info)
# write to file
write_delim(x = all_info,
path = file.path(getwd(), "temp", "test.tsv"),
delim = "\t")
答案 1 :(得分:0)
好,我会给您发送验证码
# get the full path of your mol files
mol_files <- list.files(path = file.path(getwd(), "/Users/189919604/Desktop/Download
SuperNatural II/SN00000001"), # specify your folder here
pattern = "*mol",
full.names = TRUE)
# create tibble, with filenames (incl. the full path)
df <- tibble(filenames = mol_files)
# create function to extract all the information
extract_info <- function(sdfset) {
# function to extract information from a sdfset (ChemmineR)
# this only works if there is one molecule in the sdfset
ID <- sdfset@SDF[[1]]@datablock["SNID"]
Name <- sdfset@SDF[[1]]@header["Molecule_Name"]
Molecular_Formula <- sdfset@SDF[[1]]@datablock["Molecular Formula"]
sdf_info <- tibble(SNID = ID,
Name = Name,
MolFormula = Molecular_Formula)
return(sdf_info)
}
# read all files and extract info
df <- df %>%
mutate(sdf_data = map(.x = filenames,
.f = ~ read.SDFset(sdfstr = .x)),
info = map(.x = sdf_data,
.f = ~ extract_info(sdfset = .x)))
# make a nice tibble with only the info you want
all_info <- df %>%
select(molecule) %>%
unnest(info)
# write to file
write_delim(x = all_info,
path = file.path(getwd(), "test.tsv"),
delim = "\t")