我有超过200个文件。例如,其中一个如下 它们是txt文件。我想逐个阅读它们,然后从中获取特定信息并将其导出到xls文件
例如,如何在xls文件中获取以下信息
TOTAL ENERGY = -444.38126 EV
ELECTRONIC ENERGY = -840.31531 EV
CORE-CORE REPULSION = 395.93406 EV
GRADIENT NORM = 0.91931 = 0.45965 PER ATOM
DIPOLE = 2.66600 DEBYE POINT GROUP: C2v
NO. OF FILLED LEVELS = 6
IONIZATION POTENTIAL = 10.352991 EV
HOMO LUMO ENERGIES (EV) = -10.353 0.402
MOLECULAR WEIGHT = 30.0262
COSMO AREA = 60.70 SQUARE ANGSTROMS
COSMO VOLUME = 42.52 CUBIC ANGSTROMS
我阅读了几篇帖子,他们写道可以使用
sed -n ".." file.txt
问题是,即使我要使用它,也会花费我很长时间,因为我应该把当时的一个文件读成bash 然后我应该去找每个关键字,比如
HEAT OF FORMATION
TOTAL ENERGY
ELECTRONIC ENERGY
CORE-CORE REPULSION
GRADIENT NORM
DIPOLE
NO. OF FILLED LEVELS
IONIZATION POTENTIAL
HOMO LUMO ENERGIES (EV)
MOLECULAR WEIGHT
COSMO AREA
COSMO VOLUME
然后我将这一行逐一粘贴到xls文件及其相应的行信息
SUMMARY OF PM7 CALCULATION, Site No: 29451
MOPAC2016 (Version: 18.063M)
Tue Mar 20 15:08:13 2018
No. of days remaining = 349
Empirical Formula: C H2 O = 4 atoms
SYMMETRY
Formaldehyde
GEOMETRY OPTIMISED USING EIGENVECTOR FOLLOWING (EF).
SCF FIELD WAS ACHIEVED
HEAT OF FORMATION = -25.54241 KCAL/MOL = -106.86944 KJ/MOL
TOTAL ENERGY = -444.38126 EV
ELECTRONIC ENERGY = -840.31531 EV
CORE-CORE REPULSION = 395.93406 EV
GRADIENT NORM = 0.91931 = 0.45965 PER ATOM
DIPOLE = 2.66600 DEBYE POINT GROUP: C2v
NO. OF FILLED LEVELS = 6
IONIZATION POTENTIAL = 10.352991 EV
HOMO LUMO ENERGIES (EV) = -10.353 0.402
MOLECULAR WEIGHT = 30.0262
COSMO AREA = 60.70 SQUARE ANGSTROMS
COSMO VOLUME = 42.52 CUBIC ANGSTROMS
MOLECULAR DIMENSIONS (Angstroms)
Atom Atom Distance
H 3 O 1 2.00299
H 4 O 1 1.65067
H 4 C 2 0.00000
SCF CALCULATIONS = 4
WALL-CLOCK TIME = 0.309 SECONDS
COMPUTATION TIME = 0.033 SECONDS
FINAL GEOMETRY OBTAINED
SYMMETRY
Formaldehyde
O 0.00000000 +0 0.0000000 +0 0.0000000 +0 0 0 0
C 1.20614565 +1 0.0000000 +0 0.0000000 +0 1 0 0
H 1.09115836 +1 121.2760970 +1 0.0000000 +0 2 1 0
H 1.09115836 +0 121.2760970 +0 180.0000000 +0 2 1 3
3 1 4
3 2 4
我想将数据导出到一个csv中,并将每个数据导出到彼此之下,如下所示
data1
444.38126 EV
-840.31531 EV
395.93406 EV
0.91931 = 0.45965 PER ATOM
2.66600
C2v
6
10.352991
-10.353 0.402
30.0262
60.70
42.52
我知道如何逐行读取每个文件。让我们假设输出文件是output.txt
line_num=0
text=File.open('output.txt').read
text.gsub!(/\r\n?/, "\n")
text.each_line do |line|
print "#{line_num += 1} #{line}"
end
因此它可以逐行读取,现在我尝试提取这些信息
line_num=0
text=File.open('output.txt').read
text.gsub!(/\r\n?/, "\n")
text.each_line do |line|
if line[/TOTAL ENERGY/]
puts line.split("=",2)[-1].strip
end
if line[/ELECTRONIC ENERGY/]
toggle=1
next
end
if line[/CORE-CORE REPULSION/]
toggle=1
next
if line[/GRADIENT NORM/]
toggle=1
next
if line[/DIPOLE/]
toggle=1
next
if line[/NO. OF FILLED LEVELS/]
toggle=1
next
if line[/IONIZATION POTENTIAL/]
toggle=1
next
if line[/HOMO LUMO ENERGIES (EV)/]
toggle=1
next
if line[/MOLECULAR WEIGHT /]
toggle=1
next
if line[/COSMO AREA/]
toggle=1
next
if line[/COSMO VOLUME/]
toggle=1
next
end
答案 0 :(得分:0)
一定是红宝石?如何使用bash读取文件,将结果格式化为Excel?
例如:
for filename in *.txt; do
awk '{print FILENAME ":" $0}' $filename | grep '[A-Z]\{3,\}.*=' >> r.csv
done
将创建 r.csv 文件,您可以使用菜单 Data - >在Excel中打开并格式化。列到的文本。
他们可以使用字符“=”作为列分隔符。