我有一个很大的.dat文件,其格式如下:
"Trajectory" 0
"Type : "
Transmitted
"Collisions"
"X" "Y" "Z" "Energy"
-17.418 11.0038 -2633.51 300
-7.80195 4.90819 -1317.76 300
-2.98663 1.85574 -658.878 300
-0.578976 0.329517 -329.439 300
-0.278019 0.138739 -288.259 300
-0.12754 0.0433497 -267.669 300
''
''
''
''
56.1784 -56.9043 2103.34 297.645224483
58.9321 -57.4033 2155.91 297.617470093
78.4242 -59.0752 2635.51 297.364385221
78.8647 -59.113 2646.35 297.358666592
"-----------------------------------------------------------------"
"Trajectory" 1
"Type : "
Transmitted
"Collisions"
"X" "Y" "Z" "Energy"
19.5684 -1.57545 -2633.51 300
8.78275 -0.663686 -1317.76 300
3.38175 -0.207111 -658.878 300
0.931759 0 -360 300
0.681244 0.0211774 -329.439 300
0.343681 0.0497133 -288.259 300
然后继续前进一百个“轨迹”。 我的目标是绘制所有轨迹,所以我想知道如何从这个.dat文件中拉出每条轨迹的X,Y,Z和能量数据。
谢谢!
答案 0 :(得分:0)
我认为如果数据中没有NaN
值,则可以使用sample csv file):
import pandas as pd
df = pd.read_csv('sample.csv', sep='\t', names=['X','Y','Z','Energy'])
#print (df)
#remove all rows where in column X is value X
df = df[df.X != 'X']
#add new column groups if column X contains 'Trajectory' get value of column Y
df['groups'] = df.loc[df.X.str.contains('Trajectory', na=False), 'X']
#forward fill NaN of column groups
df['groups'].ffill(inplace=True)
#remove all rows with values NaN
df = df.dropna().reset_index(drop=True)
#convert all values to float
df[['X','Y','Z','Energy']] = df[['X','Y','Z','Energy']].astype(float)
print (df)
X Y Z Energy groups
0 -17.418000 11.003800 -2633.510 300.000000 Trajectory 0
1 -2.986630 1.855740 -658.878 300.000000 Trajectory 0
2 -0.578976 0.329517 -329.439 300.000000 Trajectory 0
3 -0.278019 0.138739 -288.259 300.000000 Trajectory 0
4 -0.127540 0.043350 -267.669 300.000000 Trajectory 0
5 56.178400 -56.904300 2103.340 297.645224 Trajectory 0
6 58.932100 -57.403300 155.910 297.617470 Trajectory 0
7 78.424200 -59.075200 2635.510 297.364385 Trajectory 0
8 78.864700 -59.113000 2646.350 297.358667 Trajectory 0
9 19.568400 -1.575450 -2633.510 300.000000 Trajectory 1
10 8.782750 -0.663686 -1317.760 300.000000 Trajectory 1
11 3.381750 -0.207111 -658.878 300.000000 Trajectory 1
12 0.931759 0.000000 -360.000 300.000000 Trajectory 1
13 0.681244 0.021177 -329.439 300.000000 Trajectory 1
14 0.343681 0.049713 -288.259 300.000000 Trajectory 1
答案 1 :(得分:0)
此函数需要一个文件名并将文件解析为numpy
structured array:
def extract_trajectories(fn):
import numpy
d = []
with open(fn, 'r') as f:
trajectory = 0
data = False
for l in f:
if '"Trajectory"' in l:
trajectory = int(l.split()[1])
if '"-----------------------------------------------------------------"' in l:
data = False
if data and not "''" in l:
d.append(tuple([trajectory]+[float(x) for x in l.split()]))
if '"X" "Y" "Z"' in l:
data = True
return numpy.array(d, dtype=[('Trajectory', 'i4'), ('X', 'f4'), ('Y', 'f4'), ('Z', 'f4'), ('Energy', 'f4')])
通常,无法为非标准文件布局编写自己的代码。
例如,要获取轨迹'X'
的所有1
值,您只需索引数组:
In [6]: d['X'][d['Trajectory']==1]
Out[6]:
array([ 19.56839943, 8.78275013, 3.38175011, 0.931759 ,
0.68124402, 0.34368101], dtype=float32)