我有一个大文本文件,如小例子:
fit c3 start=1455035 step=1
2.000000
2.000000
2.000000
2.000000
2.000000
2.000000
fit c2 start=5195348 step=1
1.000000
1.000000
1.000000
1.000000
1.000000
fit c4 start=6587009 step=1
10.000000
10.000000
10.000000
10.000000
10.000000
我想尝试这样的事情:
fit c3 start=1455035 step=1
12.000000
1.000000
1.000000
1.000000
1.000000
1.000000
fit c2 start=5195348 step=1
5.000000
1.000000
1.000000
1.000000
1.000000
fit c4 start=6587009 step=1
50.000000
1.000000
1.000000
1.000000
1.000000
每个文本文件后跟一些数字行。正如您在输出中看到的那样,我希望将每个组的第一个数字替换为同一文本行(在同一组中)之下的所有数字的总和,并将其余数字替换为1.000000。 并写入一个新文件。
我实际上在python中尝试了以下代码,但没有返回我想要的内容:
infile = open("file.txt", "r")
for line in infile:
if line startswith"fit":
for l in len(line):
line[l] = line + line[l+1]
答案 0 :(得分:0)
它既不是最优雅也不是最有效的方式,但它可能会让你对你需要做的事情有所了解:
with open("test.txt", "r") as infile:
tempList = [] #Auxiliary list for number storage
sums = [] #Stores the numbers of each fit heading
fits = [] #Stores the 'fit' headings
for line in infile:
print(line)
if not line.find("fit")==-1:
fits.append(line)
sums.append(tempList)
tempList = []
else:
tempList.append(float(line.replace("\n","")))
print(tempList)
sums.append(tempList)
sums.remove([])
for i in sums:
i[0] = sum(i[0:])
for j in range(1,len(i)):
i[j] /= i[j]
print(fits)
print(sums)
with open("test2.txt", "w") as outFile:
for i in range(len(fits)):
outFile.write(fits[i])
outFile.write("\n".join(str(j) for j in sums[i]))
outFile.write("\n")
输出文件test2.txt
包含以下内容:
fit c3 start=1455035 step=1
12.0
1.0
1.0
1.0
1.0
1.0
fit c2 start=5195348 step=1
5.0
1.0
1.0
1.0
1.0
fit c4 start=6587009 step=1
50.0
1.0
1.0
1.0
1.0
答案 1 :(得分:0)
# Dictionary to store the "header" line as key
# And values will be the "sublines" you are changing
groups = {}
# First, get positions of "fit" lines
with open('file.txt', 'r') as f:
for line in f:
if line.startswith('fit'):
current = line # the current "header" ("fit" line)
groups[current] = []
else:
# Need to convert from 'str' to 'float'
groups[current].append(float(line.strip()))
# Now sum and pad with 1.0
for header in groups:
# Fill with 1.0 by adding 2 lists
# First list is length 1 and contains only the sum of the original
# Second list is length first - 1 and is all 1.0
groups[header] = [sum(groups[header])] + [float(1)] * (len(groups[header]) - 1)
# Then rewrite to file
with open('file.txt', 'w') as f:
for header in groups:
f.write(header) # May need to add a '\n' if not present in file orignally
for num in groups[header]:
# Convert 'float' back to 'str' with newline
f.write('{!s}\n'.format(num))
答案 2 :(得分:0)
一种方法是使用itertools
模块groupby
和chain
+列表理解
infile = open("file.txt", "r")
from itertools import groupby,chain
list_grp = [list(g) for k,g in groupby([i.strip() for i in infile.readlines()], lambda x:'fit' in x)]
for i in chain(*[[i[0]] if 'fit' in i[0] else list(chain(*[[sum(map(float,i))],[1.0 for i in i]])) for i in list_grp]):
print (i)
输出:
fit c3 start=1455035 step=1
12.0
1.0
1.0
1.0
1.0
1.0
1.0
fit c2 start=5195348 step=1
5.0
1.0
1.0
1.0
1.0
1.0
fit c4 start=6587009 step=1
50.0
1.0
1.0
1.0
1.0
1.0
答案 3 :(得分:0)
你也可以用熊猫来实现这个目标:
<强>设置强>
import pandas as pd
def is_float(x):
try:
float(x)
return True
except ValueError:
return False
def to_float(x):
if is_float(x):
return float(x)
else:
return x
data = pd.read_csv(file_path, header=None, converters={0:to_float}) # line 1
is_numeric_value = lambda x: not is_float(x)
condition = data[0].map(is_numeric_value)
主要强>
titles = data.loc[condition]
title_count = len(titles.index) # count of titles
for i in xrange(title_count):
ind = titles.index[i]
if (i+1) != len(titles.index):
next_ind = titles.index[i+1]
data.iat[ind+1,0]=data.iloc[ind+2:next_ind].values.sum()
else:
data.iat[ind+1,0]=data.iloc[ind+2:].values.sum() # line 2
<强>输出强>
c5d.to_csv(file_path, header=None, index=False) # line 3
你可以通过用.txt替换.csv来将csv文件转换为txt。
P.S。这是假设你有一个包含多个部分的大文件(每个部分是标题+数字行);如果您在一个文件中有一个部分,那么除了is_float
和to_float
之外,您可以在上面替换第1行,第2行和第3行。