我有一个来自.txt文件的数据集,格式为:
30 1
2477.25 0.00 1 M
40 2 11
0.17100 0.08600 0.11500 0.10800 0.05600 0.07500 9.60000 -1009.00000 -1009.00000 -1009.00000
2.70000
36 1 1
a.a.Sbargang
30 1
2477.45 0.00 2 M
40 2 11
0.52100 0.27400 0.35900 -1009.00000 -1009.00000 -1009.00000 14.30000 -1009.00000 -1009.00000 -1009.00000
2.66000
36 1 1
a.a M-gr.
格式非常混乱,我想在行和列中创建它,所以我的输出将是这样的:
30 1 2477.25 0.00 1 M 40 2 11 0.17100 0.08600 0.11500 0.10800 0.05600 0.07500 9.60000 -1009.00000 -1009.00000 -1009.00000 2.70000 36 1 1 a.a.Sbargang
30 1 2477.45 0.00 2 M 40 2 11 0.52100 0.27400 0.35900 -1009.0 -1009.0 -1009.00 14.3000 -1009.00000 -1009.00000 -1009.00000 2.66000 36 1 1 a.a M-gr.
我是python的新手,不知道如何编写python3来完成这项任务?提前致谢
我试过这样:
with open ('textdata3.txt') as f:
inputString = f.read()
inputString = re.sub(r" +"," ", inputString)
itemInString = inputString.split(" ")
row1 = []
for index, item in enumerate(itemInString):
if index % 1 == 0:
row1.append(str(item))
print(row1)
我不确定这是否是正确的方法,但在这里我将所有内容都放在一行。
输出:
['30', '1\n', '2477.25', '0.00', '1', 'M\n40', '2', '11\n', '0.17100', '0.08600', '0.11500', '0.10800', '0.05600', '0.07500', '9.60000', '-1009.00000', '-1009.00000', '-1009.00000\n', '2.70000\n36', '1', '1\n', 'Sst.Lt-gry.F-gr.Sbang.VW-cmt.VP-srt.w/Mic.Calc.Glauc.\n30', '1\n', '2477.45', '0.00', '2', 'M\n40', '2', '11\n', '0.52100', '0.27400', '0.35900', '-1009.00000', '-1009.00000', '-1009.00000', '14.30000', '-1009.00000', '-1009.00000', '-1009.00000\n', '2.66000\n36', '1', '1\n', 'a.a', 'M-gr.']
答案 0 :(得分:1)
假设数据一致地分为七行,这应该有效。
import re
rows = []
with open("input_data.txt", "rb") as input_file:
while True:
try:
row = [str(next(input_file), "utf-8") for x in xrange(7)]
rows.append(re.sub( '\s+', ' ', " ".join(row)))
except StopIteration as e:
break
with open("reformatted_data.txt", "wb") as out_file:
for row in rows:
out_file.write(row+"\n")
根据以下评论更新版本。
import re
rows = []
with open("data.txt", "rb") as input_file:
row = []
while True:
try:
data = str(next(input_file))
data = re.sub( '\s+', ' ', data).strip()
if data == "30 1":
rows.append(" ".join(row))
row = []
row.append(data)
except StopIteration as e:
rows.append(" ".join(row))
break
with open("reformatted_data.txt", "wb") as out_file:
for row in rows:
out_file.write(row+"\r\n")
答案 1 :(得分:0)
我在导出数据时遇到了类似的问题,因为我最终得到了一个巨大的列,然后需要打破那些巨大的列以重现原始结构。这段代码解决了我的问题:


 def arrumando_dados():
 #defining文件的路径
 path_to = glob.glob('... / txts / * .txt')

 #creating a empty dictionary
 idl_results = {}

 #looping文件
对于范围内的i(0,len(path_to)):

 #使用适当的名称&#xA创建变量; #that仅在数字定位裁剪时才有效#&xA; #正确
 var_name = path_to [i] [ - 6:-4]

用numpy
 #taking数据; data2 = np.loadtxt(path_to [i])

 #break排除每949项
 new_data = np.array(np.array_split(data2,949))

 #fixing for idl vs python display
 new_data_t = np.matrix.transpose(new_data)

 #updating the dictionary
 idl_results.update({var_name:new_data_t})

 return(idl_results)



 然后我认为通过一些调整,您可以使用这段代码来解决您的问题。