我正在研究用于计算给定分子集的各种热力学性质的代码。为此,我必须将9个系数插入一组方程中以获得所需的值。这些系数因分子而异,可以从NASA Thermobuild数据库中检索,该数据库具有以下格式:
C2Cl4四氯乙烯HF298 = -5.034 kcal Burcat G3B3
3 T05 / 08 C 2.00CL 4.00 0.00 0.00 0.00 0 165.8322000 -21064.348
50.000 200.000 7 -2.0 -1.0 0.0 1.0 2.0 3.0 4.0 0.0 19563.551 -5.821898980D + 03 4.158580080D + 02-7.790140830D + 00 1.615966138D-01 -6.791370520D-04
1.598431875D-06-1.556882412D-09 0.000000000D + 00-6.205198010D + 03 5.774956220D + 01
200.000 1000.000 7 -2.0 -1.0 0.0 1.0 2.0 3.0 4.0 0.0 19563.551
4.940446670D + 04 -1.030763621D + 03 1.098508036D + 01 1.645945662D-02-2.178412229D-05 1.410593520D-08-3.663931630D-12 0.000000000D + 00 -3.353235260 D + 02-2.878634227D + 01 1000.000 6000.000 7 -2.0 -1.0 0.0 1.0 2.0 3.0 4.0 0.0 19563.551 -3.067008915D + 05-1.128336557D + 03 1.681089243D + 01-3.159107946D-04 6.850908950D-08 -7.749796920D-12 3.556100470D-16 0.000000000D + 00-1.944193938D + 03-5.966771040D + 01
计算所需的具体数字以粗体显示。
(或者,以代码块形式,因此它更加整洁,更接近数据库.txt文件中的实际排列)
C2Cl4 Tetrachloroethylene HF298=-5.034 kcal Burcat G3B3
3 T05/08 C 2.00CL 4.00 0.00 0.00 0.00 0 165.8322000 -21064.348
50.000 200.000 7 -2.0 -1.0 0.0 1.0 2.0 3.0 4.0 0.0 19563.551
-5.821898980D+03 4.158580080D+02-7.790140830D+00 1.615966138D-01-6.791370520D-04
1.598431875D-06-1.556882412D-09 0.000000000D+00-6.205198010D+03 5.774956220D+01
200.000 1000.000 7 -2.0 -1.0 0.0 1.0 2.0 3.0 4.0 0.0 19563.551
4.940446670D+04-1.030763621D+03 1.098508036D+01 1.645945662D-02-2.178412229D-05
1.410593520D-08-3.663931630D-12 0.000000000D+00-3.353235260D+02-2.878634227D+01
1000.000 6000.000 7 -2.0 -1.0 0.0 1.0 2.0 3.0 4.0 0.0 19563.551
-3.067008915D+05-1.128336557D+03 1.681089243D+01-3.159107946D-04 6.850908950D-08
-7.749796920D-12 3.556100470D-16 0.000000000D+00-1.944193938D+03-5.966771040D+01
数据库中有数百个分子,但是我只需要大约50个左右的系数,我需要一个可以通过的函数,从预先编写的列表中找到所需的分子种类,然后挑选出每个分子系数并返回它们,以便我可以在计算中使用它们(并将“ D + 0%N”转换为“ E + 0%N”-我不确定为什么该数据库使用D而不是E来表示科学计数法)
我对SQL一点都不熟悉,所以我只是专注于基本的Python搜索功能。到目前为止,我的情况是这样:
import pandas as pd
import csv
import math
import numpy as np
species_list=[]
species=pd.read_table('Species list.txt') #list of molecular species I need coefficients for
species_temp=species['Species']
for i in range(len(species_temp)):
species_list.append(species_temp[i])
with open('NEWNASA.TXT','rt') as database: #loads massive coefficient database
for species_name in species_list:
species_name=species_name+" " #to avoid returning ionic forms
for line in database:
if species_name in line:
print line #test to see if it's working
但是,a)在找到第一个分子种类后,这种方法就停止工作了,b)我仍然不确定如何告诉代码找到我计算所需的特定系数。我在想它会涉及到正则表达式(我也没有很多经验)和索引,但这是我所了解的范围。任何指示或建议将不胜感激!
谢谢!
答案 0 :(得分:1)
打开的文件(database
)是一次性的迭代器。您不能多次遍历。解决方案是交换for循环-如果文件不是太大,则将文件的所有行加载到列表中。
for line in database:
for species_name in species_list:
species_name = species_name + " "
if species_name in line:
print line
答案 1 :(得分:0)
我将解决从文本数据库中的记录中提取所需数据的问题。
找到感兴趣的记录(<a {% if item.link %} href="{{ item.link }}", target="_blank", rel="noopener", aria-label="{{ item }}" {% endif %}>
--- Content ---
</a>
)后,您需要前进到该记录的第七行和第八行并提取系数。
record format表示每行长80个字符,而您感兴趣的每个数字长16个字符。因此,将第七和第八行分成五个相等的部分(Split a string to even sized chunks
)并对其进行浮动
设置:
if species_name in line:
过程:
import io
r = '''C2Cl4 Tetrachloroethylene HF298=-5.034 kcal Burcat G3B3
3 T05/08 C 2.00CL 4.00 0.00 0.00 0.00 0 165.8322000 -21064.348
50.000 200.000 7 -2.0 -1.0 0.0 1.0 2.0 3.0 4.0 0.0 19563.551
-5.821898980D+03 4.158580080D+02-7.790140830D+00 1.615966138D-01-6.791370520D-04
1.598431875D-06-1.556882412D-09 0.000000000D+00-6.205198010D+03 5.774956220D+01
200.000 1000.000 7 -2.0 -1.0 0.0 1.0 2.0 3.0 4.0 0.0 19563.551
4.940446670D+04-1.030763621D+03 1.098508036D+01 1.645945662D-02-2.178412229D-05
1.410593520D-08-3.663931630D-12 0.000000000D+00-3.353235260D+02-2.878634227D+01
1000.000 6000.000 7 -2.0 -1.0 0.0 1.0 2.0 3.0 4.0 0.0 19563.551
-3.067008915D+05-1.128336557D+03 1.681089243D+01-3.159107946D-04 6.850908950D-08
-7.749796920D-12 3.556100470D-16 0.000000000D+00-1.944193938D+03-5.966771040D+01'''
db = io.StringIO(r)
species_name = 'Tetrachloroethylene'
您需要解决@FMc带来的问题。当前,您的代码遍历列表中的名称,对于每个名称,遍历整个数据库文件以查找名称。要继续寻找名字,您需要通过将文件指针设置为开头def get_coefficients(line):
'''Split line into 5 floats.
line has five 16 character numbers.
'''
#coefficients = [line[i:i+16] for i in range(0,len(line),16)]
coefficients = [line[i:i+16] for i in range(0,80,16)] # 80 cols/line
coefficients = map(lambda q: q.replace('D','E'), coefficients)
coefficients = [float(thing) for thing in coefficients]
return coefficients
for line in db:
if species_name in line: # first lne of the record
# skip to the seventh line of the record
for _ in range(6):
line = next(db)
coefficients_1 = get_coefficients(line)
print(coefficients_1)
# skip to the eighth line of the record
line = next(db)
coefficients_2 = get_coefficients(line)
print(coefficients_2)
来再次开始查看文件的开头。
这将是非常低效的。如@Fmc所示,您需要遍历数据库的每一行,并查看它是否包含您的物种名称之一。为了增强此功能,database.seek(0)
应该是set。
species_list
很不幸,第一行的database record format与示例记录之间存在差异-
如果每条记录的第一行是您的示例和记录格式定义的某种变体,也许您可以尝试以下操作:
species_list = {'Tetrachloroethylene', 'Bar', 'Foo'}