我有一个包含以下格式数据的文本文件:
[-0.00287209 -0.00815337 -0.00322895 -0.00015178]
[-0.0038058 -0.01238539 -0.00082072 0.00040815]
[-0.00922925 -0.00394288 0.00325778 0.00083047]
[-0.01221899 0.01573175 0.00569081 0.00079524]
[0.02409868 0.02623219 0.00364268 0.00026268]
[ 0.04754814 0.00664801 -0.00204411 -0.00044964]
[-0.02286798 -0.02860896 -0.00671971 -0.00086068]
[-0.079635 -0.03532551 -0.00594647 -0.00067338]
[ 1.13691452e-03 4.88425646e-04 -3.44116748e-05 -1.08364051e-05]
我想格式化(删除括号,并删除数字之间的空格),所以它看起来像这样:
-0.00287209,-0.00815337,-0.00322895,-0.00015178
-0.0038058,-0.01238539,-0.00082072,0.00040815
-0.00922925,-0.00394288,0.00325778,0.00083047
-0.01221899,0.01573175,0.00569081,0.00079524
0.02409868,0.02623219,0.00364268,0.00026268
0.04754814,0.00664801,-0.00204411,-0.00044964
-0.02286798,-0.02860896,-0.00671971,-0.00086068
-0.079635,-0.03532551,-0.00594647,-0.00067338
1.13691452e-03,4.88425646e-04,-3.44116748e-05,-1.08364051e-05
答案 0 :(得分:1)
你可以使用像这样的正则表达式
import re
s = """[-0.00287209 -0.00815337 -0.00322895 -0.00015178]
[-0.0038058 -0.01238539 -0.00082072 0.00040815]
[-0.00922925 -0.00394288 0.00325778 0.00083047]
[-0.01221899 0.01573175 0.00569081 0.00079524]
[0.02409868 0.02623219 0.00364268 0.00026268]
[ 0.04754814 0.00664801 -0.00204411 -0.00044964]
[-0.02286798 -0.02860896 -0.00671971 -0.00086068]
[-0.079635 -0.03532551 -0.00594647 -0.00067338]
[ 1.13691452e-03 4.88425646e-04 -3.44116748e-05 -1.08364051e-05]
"""
fouine = re.compile('^\[\s*(-?\d\.?\d+(?:e-\d+)?) \s*(-?\d\.?\d+(?:e-\d+)?) \s*(-?\d\.?\d+(?:e-\d+)?) \s*(-?\d\.?\d+(?:e-\d+)?)]$', re.M)
print re.sub(fouine, r'\1,\2,\3,\4', s)
答案 1 :(得分:1)
像这样基本的东西有效:
import csv
# assuming the input is in input.txt
with open("input.txt") as input_file:
lines = input_file.readlines() # read in the entire file
fixed_lines = []
for line in lines: # for each line
line = line.strip() # remove the newline at the end
line = line.lstrip("[") # remove brackets from the left
line = line.rstrip("]") # remove brackets from the right
fixed_lines.append(line.strip().split()) # make sure there are no left over spaces and split by whitespace
# write out using the csv module
with open("output.txt", 'w') as f:
csv_writer = csv.writer(f)
csv_writer.writerows(fixed_lines)
输出:
-0.00287209,-0.00815337,-0.00322895,-0.00015178 -0.0038058,-0.01238539,-0.00082072,0.00040815 -0.00922925,-0.00394288,0.00325778,0.00083047 -0.01221899,0.01573175,0.00569081,0.00079524 0.02409868,0.02623219,0.00364268,0.00026268 0.04754814,0.00664801,-0.00204411,-0.00044964 -0.02286798,-0.02860896,-0.00671971,-0.00086068 -0.079635,-0.03532551,-0.00594647,-0.00067338 1.13691452e-03,4.88425646e-04,-3.44116748e-05,-1.08364051e-05
答案 2 :(得分:1)
按行和“列”分割内容的另一种方式
import re
s = """[-0.00287209 -0.00815337 -0.00322895 -0.00015178]
[-0.0038058 -0.01238539 -0.00082072 0.00040815]
[-0.00922925 -0.00394288 0.00325778 0.00083047]
[-0.01221899 0.01573175 0.00569081 0.00079524 ]
[0.02409868 0.02623219 0.00364268 0.00026268]
[ 0.04754814 0.00664801 -0.00204411 -0.00044964]
[-0.02286798 -0.02860896 -0.00671971 -0.00086068]
[-0.079635 -0.03532551 -0.00594647 -0.00067338]
[ 1.13691452e-03 4.88425646e-04 -3.44116748e-05 -1.08364051e-05]
"""
# remove the brackets
def remove_brackets(l): return l.strip('[]')
# split the columns and join with a comma
def put_commas(l): return ','.join(re.split(r'\s+', l))
raw_lines = s.splitlines()
clean_lines = map(remove_brackets, raw_lines)
clean_lines = map(put_commas, clean_lines)
print '\n'.join(clean_lines)