我正在尝试在目录列表上运行for循环,以便从每个目录中的某些文件行中提取特定值。目录的结构如下:
# 'sourcedir' --> main directory
# |
# |___ alineamiento1 --> directory
# | |
# | |___ modelo1 --> sub-directory with the file
# | |___ modelo2
# |
# |___ alineamiento2
# |
# |___ modelo1
# |___ modelo2
我已经在包含15个目录的主目录上运行了脚本,并且运行得非常好。但是,当我尝试在更多目录(~100)上运行时,for循环因IndexError而崩溃。
这是理想的结果:
>similarity_group_0202
-lnL m0 = -3491.164041
-lnL m1 = -3442.220417
Likelihood ratio = 97.887248, df = 26
El mejor modelo es m1. P-value = 1.09405887112e-10
这是IndexError输出:
>similarity_group_0217
Traceback (most recent call last):
File "LRTb.py", line 52, in <module>
path_m1 = os.path.join(os.getcwd(), modelos[2])
IndexError: list index out of range
有时,此错误发生在第8个目录之后,其他时间发生在第10个目录之后,依此类推。所以,基本上它不是目录中文件的问题。
这是我的剧本:
import os
import sys
import re
from scipy import stats
sourcedir = '<path/to/main/directory>'
sg=os.listdir(sourcedir)
sg[:] = [dirname for dirname in sg if dirname.startswith('s')]
n=len(sg)
grupos_m0=[]
grupos_m1=[]
for dirname in sg[0:n]:
os.chdir(os.path.join(sourcedir, dirname))
dir=os.getcwd()
print(">" + dir.split(os.path.sep)[9])
modelos = os.listdir(os.getcwd())
path_m0 = os.path.join(os.getcwd(), modelos[1])
path_m1 = os.path.join(os.getcwd(), modelos[2])
os.chdir(path_m0)
with open('out', 'r') as f:
for line in f:
if line.startswith('lnL'):
Lm0 = float(line.split()[4])
p0 = line.split()[3]
p0 = re.findall('\d+', p0 )
npm0 = int(p0[0])
print("-lnL m0 = " + str(Lm0))
os.chdir(path_m1)
with open('out', 'r') as f:
for line in f:
if line.startswith('lnL'):
Lm1 = float(line.split()[4])
p1 = line.split()[3]
p1 = re.findall('\d+', p1 )
npm1 = int(p1[0])
print("-lnL m1 = " + str(Lm1))
LR = 2*(Lm1-(Lm0)) # Likelihood-ratio (2*delta(-lnL))
df = npm1-npm0
print('Likelihood ratio = ' + str(LR) + ', ' + 'df = ' + str(df))
p = stats.chi2.pdf(LR,df)
if p < 0.05:
print('El mejor modelo es m1. P-value = ' + str(p) + "\n")
grupos_m1.append(dir.split(os.path.sep)[9])
else:
print('El mejor modelo es m0. P-value > 0.05)' + "\n")
grupos_m0.append(dir.split(os.path.sep)[9])
print("Alineamientos donde el mejor modelo es m0: " + str(len(grupos_m0)) + "\n" + str(grupos_m0))
print("Alineamientos donde el mejor modelo es m1: " + str(len(grupos_m1)) + "\n" + str(grupos_m1))
非常感谢任何帮助! 谢谢!