Question

我有一个来自.txt文件的数据集，格式为：

30     1
     2477.25     0.00    1                   M
40     2   11
        0.17100     0.08600     0.11500     0.10800     0.05600     0.07500 9.60000 -1009.00000 -1009.00000 -1009.00000
        2.70000
36     1    1
   a.a.Sbargang
30     1
     2477.45     0.00    2                   M
40     2   11
      0.52100     0.27400     0.35900 -1009.00000 -1009.00000 -1009.00000    14.30000 -1009.00000 -1009.00000 -1009.00000
      2.66000
36     1    1
   a.a M-gr.

格式非常混乱，我想在行和列中创建它，所以我的输出将是这样的：

30 1 2477.25 0.00 1 M 40 2 11 0.17100 0.08600 0.11500 0.10800 0.05600  0.07500 9.60000 -1009.00000 -1009.00000 -1009.00000 2.70000 36 1 1 a.a.Sbargang
30 1 2477.45 0.00 2 M 40 2 11 0.52100 0.27400 0.35900 -1009.0 -1009.0 -1009.00 14.3000 -1009.00000 -1009.00000 -1009.00000 2.66000 36 1 1 a.a M-gr.

我是python的新手，不知道如何编写python3来完成这项任务？提前致谢

我试过这样：

with open ('textdata3.txt') as f:
    inputString = f.read()

inputString = re.sub(r" +"," ", inputString)
itemInString = inputString.split(" ")

row1 = []
for index, item in enumerate(itemInString):
    if index % 1 == 0:
    row1.append(str(item))

print(row1)

我不确定这是否是正确的方法，但在这里我将所有内容都放在一行。

输出：

['30', '1\n', '2477.25', '0.00', '1', 'M\n40', '2', '11\n', '0.17100', '0.08600', '0.11500', '0.10800', '0.05600', '0.07500', '9.60000', '-1009.00000', '-1009.00000', '-1009.00000\n', '2.70000\n36', '1', '1\n', 'Sst.Lt-gry.F-gr.Sbang.VW-cmt.VP-srt.w/Mic.Calc.Glauc.\n30', '1\n', '2477.45', '0.00', '2', 'M\n40', '2', '11\n', '0.52100', '0.27400', '0.35900', '-1009.00000', '-1009.00000', '-1009.00000', '14.30000', '-1009.00000', '-1009.00000', '-1009.00000\n', '2.66000\n36', '1', '1\n', 'a.a', 'M-gr.']

Answer 1

假设数据一致地分为七行，这应该有效。

import re
rows = []
with open("input_data.txt", "rb") as input_file:
    while True:
        try:
            row = [str(next(input_file), "utf-8") for x in xrange(7)]
            rows.append(re.sub( '\s+', ' ', " ".join(row)))
        except StopIteration as e:
            break

with open("reformatted_data.txt", "wb") as out_file:
    for row in rows:
        out_file.write(row+"\n")

根据以下评论更新版本。

import re
rows = []
with open("data.txt", "rb") as input_file:
    row = []
    while True:
        try:
            data = str(next(input_file))
            data = re.sub( '\s+', ' ', data).strip()
            if data == "30 1":
                rows.append(" ".join(row))
                row = []

            row.append(data)

        except StopIteration as e:
            rows.append(" ".join(row))
            break

with open("reformatted_data.txt", "wb") as out_file:
    for row in rows:
        out_file.write(row+"\r\n")

Answer 2

我在导出数据时遇到了类似的问题，因为我最终得到了一个巨大的列，然后需要打破那些巨大的列以重现原始结构。这段代码解决了我的问题：

＆＃xA;＆＃xA;

  def arrumando_dados（）：＆＃xA; #defining文件的路径＆＃xA; path_to = glob.glob（'... / txts / * .txt'）＆＃xA;＆＃xA; #creating a empty dictionary＆＃xA; idl_results = {}＆＃xA;＆＃xA; #looping文件＆＃xA;对于范围内的i（0，len（path_to））：＆＃xA;＆＃xA; ＃使用适当的名称＆＃xA创建变量; #that仅在数字定位裁剪时才有效＃＆xA; ＃正确＆＃XA; var_name = path_to [i] [ -  6：-4]＆＃xA;＆＃xA;用numpy＆＃xA; #taking数据; data2 = np.loadtxt（path_to [i]）＆＃xA;＆＃xA; #break排除每949项＆＃xA; new_data = np.array（np.array_split（data2,949））＆＃xA;＆＃xA; #fixing for idl vs python display＆＃xA; new_data_t = np.matrix.transpose（new_data）＆＃xA;＆＃xA; #updating the dictionary＆＃xA; idl_results.update（{var_name：new_data_t}）＆＃xA;＆＃xA; return（idl_results）＆＃xA;

＆＃xA;＆＃xA;

然后我认为通过一些调整，您可以使用这段代码来解决您的问题。

＆＃XA;

从.txt文件中提取数据并以行和列的形式获取这些数据

2 个答案: