Question

我有一个包含以下格式数据的文本文件：

[-0.00287209 -0.00815337 -0.00322895 -0.00015178]
[-0.0038058  -0.01238539 -0.00082072  0.00040815]
[-0.00922925 -0.00394288  0.00325778  0.00083047]
[-0.01221899  0.01573175  0.00569081  0.00079524]
[0.02409868 0.02623219 0.00364268 0.00026268]
[ 0.04754814  0.00664801 -0.00204411 -0.00044964]
[-0.02286798 -0.02860896 -0.00671971 -0.00086068]
[-0.079635   -0.03532551 -0.00594647 -0.00067338]
[ 1.13691452e-03  4.88425646e-04 -3.44116748e-05 -1.08364051e-05]

我想格式化（删除括号，并删除数字之间的空格），所以它看起来像这样：

-0.00287209,-0.00815337,-0.00322895,-0.00015178
-0.0038058,-0.01238539,-0.00082072,0.00040815
-0.00922925,-0.00394288,0.00325778,0.00083047
-0.01221899,0.01573175,0.00569081,0.00079524
0.02409868,0.02623219,0.00364268,0.00026268
0.04754814,0.00664801,-0.00204411,-0.00044964
-0.02286798,-0.02860896,-0.00671971,-0.00086068
-0.079635,-0.03532551,-0.00594647,-0.00067338
1.13691452e-03,4.88425646e-04,-3.44116748e-05,-1.08364051e-05

Answer 1

你可以使用像这样的正则表达式

import re

s = """[-0.00287209 -0.00815337 -0.00322895 -0.00015178]
[-0.0038058  -0.01238539 -0.00082072  0.00040815]
[-0.00922925 -0.00394288  0.00325778  0.00083047]
[-0.01221899  0.01573175  0.00569081  0.00079524]
[0.02409868 0.02623219 0.00364268 0.00026268]
[ 0.04754814  0.00664801 -0.00204411 -0.00044964]
[-0.02286798 -0.02860896 -0.00671971 -0.00086068]
[-0.079635   -0.03532551 -0.00594647 -0.00067338]
[ 1.13691452e-03  4.88425646e-04 -3.44116748e-05 -1.08364051e-05]
"""

fouine = re.compile('^\[\s*(-?\d\.?\d+(?:e-\d+)?) \s*(-?\d\.?\d+(?:e-\d+)?) \s*(-?\d\.?\d+(?:e-\d+)?) \s*(-?\d\.?\d+(?:e-\d+)?)]$', re.M)

print re.sub(fouine, r'\1,\2,\3,\4', s)

Answer 2

像这样基本的东西有效：

import csv

# assuming the input is in input.txt
with open("input.txt") as input_file:
    lines = input_file.readlines() # read in the entire file


fixed_lines = []
for line in lines: # for each line
    line = line.strip() # remove the newline at the end
    line = line.lstrip("[") # remove brackets from the left
    line = line.rstrip("]") # remove brackets from the right
    fixed_lines.append(line.strip().split()) # make sure there are no left over spaces and split by whitespace

# write out using the csv module
with open("output.txt", 'w') as f:
    csv_writer = csv.writer(f)
    csv_writer.writerows(fixed_lines)

输出：

-0.00287209,-0.00815337,-0.00322895,-0.00015178
-0.0038058,-0.01238539,-0.00082072,0.00040815
-0.00922925,-0.00394288,0.00325778,0.00083047
-0.01221899,0.01573175,0.00569081,0.00079524
0.02409868,0.02623219,0.00364268,0.00026268
0.04754814,0.00664801,-0.00204411,-0.00044964
-0.02286798,-0.02860896,-0.00671971,-0.00086068
-0.079635,-0.03532551,-0.00594647,-0.00067338
1.13691452e-03,4.88425646e-04,-3.44116748e-05,-1.08364051e-05

Answer 3

按行和“列”分割内容的另一种方式

import re
s = """[-0.00287209 -0.00815337 -0.00322895 -0.00015178]
[-0.0038058  -0.01238539 -0.00082072  0.00040815]
[-0.00922925 -0.00394288  0.00325778  0.00083047]
[-0.01221899  0.01573175  0.00569081  0.00079524 ]
[0.02409868 0.02623219 0.00364268 0.00026268]
[ 0.04754814  0.00664801 -0.00204411 -0.00044964]
[-0.02286798 -0.02860896 -0.00671971 -0.00086068]
[-0.079635   -0.03532551 -0.00594647 -0.00067338]
[ 1.13691452e-03  4.88425646e-04 -3.44116748e-05 -1.08364051e-05]
"""

# remove the brackets
def remove_brackets(l): return l.strip('[]')
# split the columns and join with a comma
def put_commas(l): return ','.join(re.split(r'\s+', l))

raw_lines = s.splitlines()
clean_lines = map(remove_brackets, raw_lines)
clean_lines = map(put_commas, clean_lines)

print '\n'.join(clean_lines)

格式化文本文件中的数据

3 个答案: