Question

您好我有200多个名字相似的数据文件，如abc.20.0000.catalog.out，abc.20.1000.catalog.out，abc.20.2000.catalog.out ...... abc.40.0000.catalog .out

每个文件都包含这样的数据

   Group catalog for redshift  18.1000
 1) group ID
 2) group mass (Msun/h)
 3- 5) initial position (Mpc/h)
 6- 8) final position (Mpc/h)
 9-11) velocity (km/s)
 12) number of particles  

250103187  0.227591E+08 1.86  1.03  2.51  1.65  1.06  2.53  -47.56  7.50  3.83    328
202456030  0.167918E+08 0.29  4.57  2.02  0.23  4.63  2.14  -13.27  10.67 3.68    242
89479147  0.763262E+06  1.47  4.80  0.89  1.34  4.83  0.99  -28.90  6.20  17.30    11

每个此类文件包含超过10 ^ 6行。

我想做以下事情： 1.我想从每个文件中读取数据并删除顶部的文本。 2.然后我想将所有这些文件中的数据存储到一个大的矩阵列表中，每个矩阵都是来自这些文件的数据。

Answer 1

这是一个Python / Pandas解决方案：

import pandas as pd
import glob

L = []
for f in glob.glob('abc*'):       
   df = pd.read_csv(f,skiprows=1)
   L.append(df.values)

Answer 2

以下是第1部分：

一段特别有用的代码是：'abc.{0}000.catalog.out'.format(someString)。 {0}充当someString所有内容的占位符。因此，要阅读所有文件并删除顶部文本，您可能会：

for i in range(200, 401):
    file_name = 'abc.{0}000.catalog.out'.format(str(i)[:2] + '.' + str(i)[2:])

    with open(file_name) as file:
        wanted_lines = file.readlines()
    with open(file_name, 'w+') as file:
         string = ''
         for lines in wanted_lines[8:]:
             string += lines
         file.write(string)

我想将所有这些文件中的数据存储到一个大的矩阵列表中，每个矩阵都是来自这些文件的数据。

你将不得不澄清或举例说明你在谈论什么。

使用Python

2 个答案: