我正在尝试为几个csv文件中的列绘制一个boxplot(当然没有标题行),但是围绕元组,列表和数组遇到了一些混乱。这是我到目前为止所拥有的
#!/usr/bin/env python
import csv
from numpy import *
import pylab as p
import matplotlib
#open one file, until boxplot-ing works
f = csv.reader (open('2-node.csv'))
#get all the columns in the file
timeStamp,elapsed,label,responseCode,responseMessage,threadName,dataType,success,bytes,Latency = zip(*f)
#Make list out of elapsed to pop the 1st element -- the header
elapsed_list = list(elapsed)
elapsed_list.pop(0)
#Turn list back to a tuple
elapsed = tuple(elapsed_list)
#Turn list to an numpy array
elapsed_array = array(elapsed_list)
#Elapsed Column statically entered into an array
data_array = ([4631, 3641, 1902, 1937, 1745, 8937] )
print data_array #prints in this format: ([xx,xx,xx,xx]), .__class__ is list ... ?
print elapsed #prints in this format: ('xx','xx','xx','xx'), .__class__ is tuple
print elapsed_list # #print in this format: ['xx', 'xx', 'xx', 'xx', 'xx'], .__class__ is list
print elapsed_array #prints in this format: ['xx' 'xx' 'xx' 'xx' 'xx'] -- notice no commas, .__class__ is numpy.ndarray
p.boxplot (data_array) #works
p.boxplot (elapsed) # does not work, error below
p.boxplit (elapsed_list) #does not work
p.boxplot (elapsed_array) #does not work
p.show()
对于箱图,第一个参数是“an array or a sequence of vectors”,所以我认为elapsed_array
会起作用......?但是data_array
,一个“列表”有效......但是elapsed_list`一个“列表”没有...?有更好的方法吗??
我是python的新手,我想了解一下元组,列表和numpy-array之间的差异会阻止这个boxplot工作。
示例错误消息是:
Traceback (most recent call last):
File "../pullcol.py", line 32, in <module>
p.boxplot (elapsed_list)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/matplotlib/pyplot.py", line 1962, in boxplot
ret = ax.boxplot(x, notch, sym, vert, whis, positions, widths, patch_artist, bootstrap)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/matplotlib/axes.py", line 5383, in boxplot
q1, med, q3 = mlab.prctile(d,[25,50,75])
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/matplotlib/mlab.py", line 946, in prctile
return _interpolate(values[ai],values[bi],frac)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/matplotlib/mlab.py", line 920, in _interpolate
return a + (b - a)*fraction
TypeError: unsupported operand type(s) for -: 'numpy.ndarray' and 'numpy.ndarray'
答案 0 :(得分:4)
elapsed
包含字符串。 Matplotlib需要整数或浮点数来绘制某些东西。尝试将elapsed
的每个值转换为整数。你可以这样做
elapsed = tuple([int(i) for i in elapsed])
或FredL评论如下:
elapsed_list = array(elapsed_list, dtype=float)
答案 1 :(得分:1)
我不熟悉numpy或matplotlib,但仅从描述和工作原理来看,它似乎正在寻找嵌套的序列序列。这就是为什么data_array工作,因为它是一个包含列表的元组,其中所有其他输入只有一层深。
至于差异,列表是一个可变的对象序列,元组是一个不可变的对象序列,一个数组是一个可变的字节序列,整数,字符(基本上是1,2,4或8字节值)
这是一个关于5.6. Sequence Types的Python文档的链接,从那里你可以跳转到有关列表,元组,数组或Python中任何其他序列类型的更详细信息。