我最近决定尝试使用matplotlib.pyplot,同时使用gnuplot进行科学数据绘图多年。我开始只是简单地读取一个数据文件并绘制两列,就像gnuplot会对plot 'datafile' u 1:2
做的那样。
我的舒适要求是:
#
开头的行并跳过空行。 现在,以下代码是我解决问题的方法。然而,与gnuplot相比,它确实没那么快。这有点奇怪,因为我读到py(plot / thon)相对于gnuplot的一大优势是它的速度。
import numpy as np
import matplotlib.pyplot as plt
import sys
datafile = sys.argv[1]
data = []
for line in open(datafile,'r'):
if line and line[0] != '#':
cols = filter(lambda x: x!='',line.split(' '))
for index,col in enumerate(cols):
if len(data) <= index:
data.append([])
data[index].append(float(col))
plt.plot(data[0],data[1])
plt.show()
如何使数据读取速度更快?我快速浏览了csv
模块,但它对文件中的注释似乎不太灵活,而且还需要迭代文件中的所有行。
答案 0 :(得分:5)
由于您安装了matplotlib,因此您还必须安装numpy。 numpy.genfromtxt满足您的所有要求,并且比在Python循环中自己解析文件要快得多:
import numpy as np
import matplotlib.pyplot as plt
import textwrap
fname='/tmp/tmp.dat'
with open(fname,'w') as f:
f.write(textwrap.dedent('''\
id col1 col2 col3
2010 1 2 3 4
# Foo
2011 5 6 7 8
# Bar
# Baz
2012 8 7 6 5
'''))
data = np.genfromtxt(fname,
comments='#', # skip comment lines
dtype = None, # guess dtype of each column
names=True) # use first line as column names
print(data)
plt.plot(data['id'],data['col2'])
plt.show()
答案 1 :(得分:2)
你真的需要profile your code来找出瓶颈是什么。
以下是一些微观优化:
import numpy as np
import matplotlib.pyplot as plt
import sys
datafile = sys.argv[1]
data = []
# use with to auto-close the file
for line in open(datafile,'r'):
# line will never be False because it will always have at least a newline
# maybe you mean line.rstrip()?
# you can also try line.startswith('#') instead of line[0] != '#'
if line and line[0] != '#':
# not sure of the point of this
# just line.split() will allow any number of spaces
# if you do need it, use a list comprehension
# cols = [col for col in line.split(' ') if col]
# filter on a user-defined function is slow
cols = filter(lambda x: x!='',line.split(' '))
for index,col in enumerate(cols):
# just made data a collections.defaultdict
# initialized as data = defaultdict(list)
# and you can skip this 'if' statement entirely
if len(data) <= index:
data.append([])
data[index].append(float(col))
plt.plot(data[0],data[1])
plt.show()
您可能会执行以下操作:
with open(datafile) as f:
lines = (line.split() for line in f
if line.rstrip() and not line.startswith('#'))
data = zip(*[float(col) for col in line for line in lines])
这将为您提供list
tuple
而不是int
- dict
list
个{{1}},但其他方面看起来相同。它可以作为一个单行程完成,但我把它分开以使它更容易阅读。