我有一个文件夹(Molecules),其中包含许多代表不同分子的sdf文件(M00001.sdf,M00002.sdf等)。我也有一个csv,每行代表一个分子(M00001,M00002等)。 我正在编写代码,以便在Molecules文件夹中获取文件,如果它们的名称是csv文件中的一行。
首次尝试
import os
path_to_files = '/path_to_folder/Molecules' # path to Molecules folder
for files in os.listdir(path_to_files):
names = os.path.splitext(files)[0] # get the basename (molecule name)
with open('molecules.csv') as ligs: # Open the csv file of molecules names
for hits in ligs:
if names == hits:
print names, hits
else:
print 'File is not here'
然而,这在命令行上没有返回任何内容(几乎没有)。这段代码有什么问题?
答案 0 :(得分:0)
我不确定这是最好的方式(我只知道以下代码适用于我的数据)但是如果你的 molecule.csv 具有标准的csv格式,即"你可以试着用这种方式重新排列你的代码:
分子1,分子2,分子3 ..."import os
import csv
path_to_files = '/path_to_folder/Molecules' # path to Molecules folder
for files in os.listdir(path_to_files):
names = os.path.basename(files)
names = names.replace(".sdf","")
with open('molecules.csv','r') as ligs:
content = csv.reader(ligs)
for elem in content:
for hits in elem:
if names == hits:
print names, hits
else:
print 'File is not here'
有关csv模块的信息,请参阅csv File Reading and Writing
答案 1 :(得分:0)
我用一种相当粗野的方法解决了这个问题
import os
import csv
import shutil
path_to_files = None # path to Molecules folder
new_path = None # new folder to save files
os.mkdir(new_path) # create the folder to store the molecules
hits = open('molecules.csv', 'r')
ligands = []
for line in hits:
lig = line.rstrip('\n')
ligands.append(lig)
for files in os.listdir(path_to_files):
molecule_name = os.path.splitext(files)[0]
full_name = '/' + molecule_name + '.sdf'
old_file = path_to_files + full_name
new_file = new_path + full_name
if molecule_name in ligands:
shutil.copy(old_file, new_file)