首先,抱歉可怜的Engish。 我有一个重复格式的文件。如
326 Iteration: 0 #Bonds: 10
1 6 7 14 54 70 77 0 0 0 0 0 1 0.693 0.632 0.847 0.750 0.644 0.000 0.000 0.000 0.000 0.000 3.566 0.000 0.028
2 6 3 6 15 55 0 0 0 0 0 0 1 0.925 0.920 0.909 0.892 0.000 0.000 0.000 0.000 0.000 0.000 3.645 0.000 -0.040
3 6 2 8 10 52 0 0 0 0 0 0 1 0.925 0.910 0.920 0.898 0.000 0.000 0.000 0.000 0.000 0.000 3.653 0.000 0.000
...
324 8 323 0 0 0 0 0 0 0 0 0 100 0.871 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.871 3.000 -0.493
325 2 326 0 0 0 0 0 0 0 0 0 101 0.930 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.930 0.000 0.334
326 8 325 0 0 0 0 0 0 0 0 0 101 0.930 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.930 3.000 -0.611
637.916060425841 306.094529423257 1250.10511927236
6.782126993565285E-006
326 (repeating from here) Iteration: 100 #Bonds: 10
1 6 7 14 54 64 70 77 0 0 0 0 1 0.885 0.580 0.819 0.335 0.784 0.709 0.000 0.000 0.000 0.000 4.111 0.000 0.025
2 6 3 6 15 55 0 0 0 0 0 0 1 0.812 0.992 0.869 0.966 0.000 0.000 0.000 0.000 0.000 0.000 3.639 0.000 -0.034
3 6 2 8 10 52 0 0 0 0 0 0 1 0.812 0.966 0.989 0.926 0.000 0.000 0.000 0.000 0.000 0.000 3.692 0.000 0.004
我想分析数据,所有重复“帧”的每个第2~第327行的第3~第12列,从每个帧的目标矩阵打印0的数量和非0数据的数量。同时打印第1列,第2列和第13列。所以预期的输出文件变得像
326
1
1 6 5 5 1
2 6 4 6 1
...
325 2 1 9 101
326 8 1 9 101
326 (Next frame starts from here)
2
1 6 5 5 1
2 6 4 6 1
...
326
3
1 6 5 5 1
2 6 4 6 1
...
因此,结果文件有2个标题行,分析了326行的数据,每帧总共328行。同样的格式也会重复下一帧。建议使用该格式的结果数据(每个5个空格)将该文件用于其他目的。
我正在使用的方法是,为13列创建13个数组 - >为每个帧使用double for循环存储数据,每个328行。但我不知道如何处理输出。
以下是我的试用代码(未完成,仅用于读取输入),但此代码存在很多问题。 Linecache读取整行,而不是每一行的第一个数字。每帧都有326 + 3 = 329行,但似乎我的代码不能正确地用于框架式工作。我欢迎任何帮助和协助分析这些数据。非常感谢你提前。
# Read the file
filename = raw_input("Enter the file name \n")
file = open(filename, 'r')
# Read the number of atom from header
import linecache
nnn = linecache.getline(filename, 1)
natoms = int(nnn)
singleframe = natoms + 3
# get number of frames
nlines = 0
for i1 in file:
nlines = nlines +1
file.close()
nframes = nlines / singleframe
print 'no of lines are: ', nlines
print 'no of frames are: ', nframes
print 'no of atoms are:', natoms
# Create 1d string array
nrange = range(nlines)
data_lines = [None]*(nlines)
# Store whole input file into string array
file = open(filename, 'r')
i1=0
for i1 in nrange:
data_lines[i1] = file.readline()
file.close()
# Create 1d array to store atomic data
at_index = [None]*natoms
at_type = [None]*natoms
n1 = [None]*natoms
n2 = [None]*natoms
n3 = [None]*natoms
n4 = [None]*natoms
n5 = [None]*natoms
n6 = [None]*natoms
n7 = [None]*natoms
n8 = [None]*natoms
n9 = [None]*natoms
n10 = [None]*natoms
molnr = [None]*natoms
nrange1= range(natoms)
nframe = range(nframes)
file = open('output_force','w')
print data_lines[9]
for j1 in nframe:
start = j1*(natoms + 3) + 3
for i1 in nrange1:
line = data_lines[i1+start].split() #Split each line based on spaces
at_index[i1] = int(line[0])
at_type[i1] = int(line[1])
n1[i1]= int(line[2])
n2[i1]= int(line[3])
n3[i1]= int(line[4])
n4[i1]= int(line[5])
n5[i1]= int(line[6])
n6[i1]= int(line[7])
n7[i1]= int(line[8])
n8[i1]= int(line[9])
n9[i1]= int(line[10])
n10[i1]= int(line[11])
molnr[i1]= int(line[12])
答案 0 :(得分:0)
当您使用csv文件时,您应该查看csv module。我写了一个应该做的诀窍代码。
此代码假定“良好数据”。如果您的数据集可能包含错误(例如列数小于13或数据行少于326),则应进行一些更改。
(已更改为符合Python 2.6.6)
import csv
with open('mydata.csv') as in_file:
with open('outfile.csv', 'wb') as out_file:
csv_reader = csv.reader(in_file, delimiter=' ', skipinitialspace=True)
csv_writer = csv.writer(out_file, delimiter = '\t')
# Iterate over all rows in the file
for i, header in enumerate(csv_reader):
# Get the header data
num = header[0]
csv_writer.writerow([num])
# Write frame number, starting with 1 (hence the +1 part)
csv_writer.writerow([i+1])
# Iterate over all data rows
for _ in xrange(326):
# Call next(csv_reader) to get the next row
# Put inside a try ... except to avoid StopIteration exception
# if end of file is found before reaching 326 lines
try:
row = next(csv_reader)
except StopIteration:
break
# Use list comprehension to extract number of zeros
zeros = sum([1 for x in row[2:12] if x.strip() == '0'])
not_zeros = 10 - zeros
# Write the data to output file
out = [row[0].strip(), row[1].strip(),not_zeros, zeros, row[12].strip()]
csv_writer.writerow(out)
# If the
else:
# Skip the last two lines of the file
next(csv_reader)
next(csv_reader)
对于前三行,这会产生:
326
1
1 6 5 5 1
2 6 4 6 1
3 6 4 6 1