需要从python3中的文本文件中提取表格数据

时间:2019-01-16 18:30:50

标签: python-3.x extract tabular

我从Quantum Chemistry程序中获得了输出,我希望从中提取表格数据以输入到我25年前编写的FORTRAN程序的Python端口中。

某些输出文件很长,多达6000行,因此无法使用电子表格进行处理。

典型表格的格式为:

                             CARTESIAN COORDINATES

   1    C        0.011987266    -0.003842185     0.006578784
   2    H        1.097152909    -0.003956163     0.013339310
   3    H       -0.349612312     1.019316731     0.001903075
   4    H       -0.344276148    -0.517463019    -0.880495291
   5    H       -0.355315644    -0.513266496     0.891567896

我不是要别人为我编写Python代码,而是要在可用代码的迷宫中给我一些指导。

3 个答案:

答案 0 :(得分:1)

我会使用readlines和split。

var res2 = lst.SelectMany(x => lst, (t1, t2) => new { t1, t2 });

输出:

cc = 'CARTESIAN_COORDINATES.txt'

with open(cc) as data:
    lines = data.readlines()[2:] # skip first two lines
    for line in lines:
        ls = line.split()
        a, b, c, d, e = int(ls[0]), ls[1], float(ls[2]), float(ls[3]), float(ls[4])
        print(a, b, c, d, e)

答案 1 :(得分:0)

我建议您研究np.genfromtxt。 以下代码段将从存储在名为data.txt的文件中的问题中读取示例数据。

import numpy as np
data = np.genfromtxt('data.txt', skip_header=2, dtype=[('id', 'i8'),('label','S1'),('x','f8'),('y','f8'),('z','f8')])
print(data)

输出

 [(1, b'C',  0.01198727, -0.00384219,  0.00657878)
 (2, b'H',  1.09715291, -0.00395616,  0.01333931)
 (3, b'H', -0.34961231,  1.01931673,  0.00190307)
 (4, b'H', -0.34427615, -0.51746302, -0.88049529)
 (5, b'H', -0.35531564, -0.5132665 ,  0.8915679 )]

答案 2 :(得分:0)

正则表达式用于从数据中提取内容-如果您的表始终定义良好,则可以使用以下命令提取它们:https://regex101.com/r/QUT2o3/2

 //const dirents = await fs.readdirSync(path, {withFileTypes:true});

    const dirents = [1,2,3]


    const upload = (example) => {
        console.log('uploading...');
        return new Promise(function (resolve, reject) {
            resolve(example)
        });
    };

    function callUpload(dirent, length, count) {

        if(count >= dirents.length){
            return;
        }
        upload(dirent).then(() => {
            count += 1;
            console.log("success")
            callUpload(dirents[count], dirents.length, count)
        })
    }
    callUpload(dirents[0], dirents.length, 0)

应用正则表达式:

import re

regex = r"(\d+ +\w+ (?: +-?\d+\.\d+){3}.+?(?:\n|\Z){2})+"

test_str = ("                      CARTESIAN COORDINATES\n\n"
    "   1    C        0.011987266    -0.003842185     0.006578784\n"
    "   2    H        1.097152909    -0.003956163     0.013339310\n"
    "   3    H       -0.349612312     1.019316731     0.001903075\n"
    "   4    H       -0.344276148    -0.517463019    -0.880495291\n"
    "   5    H       -0.355315644    -0.513266496     0.891567896\n\n\n\n"
    "                      CARTESIAN COORDINATES\n\n"
    "   1    C        0.011987266    -0.003842185     0.006578784\n"
    "   2    H        1.097152909    -0.003956163     0.013339310\n"
    "   3    H       -0.349612312     1.019316731     0.001903075\n"
    "   4    H       -0.344276148    -0.517463019    -0.880495291\n"
    "   5    H       -0.355315644    -0.513266496     0.891567896\n\n\n"
    "                      CARTESIAN COORDINATES\n\n"
    "   1    C        0.011987266    -0.003842185     0.006578784\n"
    "   2    H        1.097152909    -0.003956163     0.013339310\n"
    "   3    H       -0.349612312     1.019316731     0.001903075\n"
    "   4    H       -0.344276148    -0.517463019    -0.880495291\n"
    "   5    H       -0.355315644    -0.513266496     0.891567896")

输出:

matches = re.findall(regex, test_str, re.MULTILINE | re.DOTALL)
for m in matches:
    print('\n'.join(x.strip() for x in m.splitlines()))