Question

我正在尝试解析一些数据以生成直方图

数据在多列中，但对我来说唯一相关的列是下面的两列。

X

AB    42

CD    77

AB    33

AB    42

AB    33

CD    54

AB    33

仅对于AB行，我想绘制col 2中值的直方图。所以直方图应该排序和绘图：

33 - 3

42 - 2

（即使先出现42，我想先绘制33）。

我有很多列，但它需要grep'AB'字符，只搜索那些行。有人可以帮忙吗？

更新：数据在csv文件中，有几列。

编辑：我现在以这种格式将数据放在csv文件中。

地址，数据

FromAP，42

FromAP，33

ToAP，77

FromAP，54

FromAP，42

FromAP，33

ToAP，42

FromAP，42

FromAP，33

如果我使用@dranxo中的代码，

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv('data.csv', sep=',')

df_useful = df[df['Addresses'] == 'FromAP']

df_useful.hist()
plt.show()

我收到以下错误：

Laptop@ubuntu:~/temp$ ./a.py
/usr/lib/pymodules/python2.7/matplotlib/axes.py:8261: UserWarning: 2D hist input should be nsamples x nvariables;
 this looks transposed (shape is 0 x 1)
  'this looks transposed (shape is %d x %d)' % x.shape[::-1])
Traceback (most recent call last):
  File "./a.py", line 11, in <module>
    df_useful.hist()
   File "/usr/lib/python2.7/dist-packages/pandas/tools/plotting.py", line 2075, in hist_frame
    ax.hist(data[col].dropna().values, **kwds)
  File "/usr/lib/pymodules/python2.7/matplotlib/axes.py", line 8312, in hist
    xmin = min(xmin, xi.min())
  File "/usr/lib/python2.7/dist-packages/numpy/core/_methods.py", line 21, in _amin
    out=out, keepdims=keepdims)
ValueError: zero-size array to reduction operation minimum which has no identity

我确实安装了pandas包，numpy，matplotlib。感谢

Answer 1

以下代码示例将起作用。请注意，根据确切的格式，阅读CSV可能会有所不同。 See this question用于阅读CSV。

import csv
with open("/tmp/test.csv", "r") as f:
    #Filter the result for "AB" as we read the lines from the file
    filtered_result = [tuple(line) for line in csv.reader(f) if line[0] == "AB"]

#Now, sort the result by the second column
final_result = sorted(filtered_result,key=lambda x: x[1])

#Print it for inspection
for key, value in final_result:
    print "key: %s, value: %s" % (key, value)

输出：

key: AB, value: 33
key: AB, value: 33
key: AB, value: 33
key: AB, value: 42
key: AB, value: 42

/tmp/test.csv的内容：

AB,42
CD,77
AB,33
AB,42
AB,33
CD,54
AB,33

我使用100,000行随机数据填充/tmp/test.csv，这是我的脚本需要多长时间：

$ time python test.py 

real    0m0.073s
user    0m0.073s
sys 0m0.000s

编辑：已更新以获得更好的性能并显示CSV的示例编辑：再次更新以更快

Answer 2

有两个不同的问题：

解析CSV - Python有一个inbuilt library for CSV。
绘制结果图表 - 您的Python程序是否需要生成直方图？或者将解析后的CSV放入某些电子表格软件并在那里进行是否可以接受？

如果你必须让你的Python程序生成直方图，那么这里有一个list of graphing libraries来帮你开始。

Answer 3

我认为数据位于file.csv，AB位于第一列，42位于第二列

import csv
reader = csv.reader(open('file.csv', 'r'))
dic = {}
for row in reader:
    if row[0] == 'AB':
        value = int(row[1])
        if  value in dic.keys():
            dic[value] += 1
        else:
            dic[value] = 1

#sorted print 
for key in sorted(dic):
    print '%s-%s'%(key, dic[key])

Answer 4

你有没有看过pandas？

以下是如何解析数据和绘图的几行：

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv('data.ssv', sep=' ')

df_useful = df[df['letters'] == 'AB']

df_useful.hist()
plt.show()

enter image description here

注意：我将您的数据保存到名为＆＃39; data.ssv＆＃39;的文件中。在致电pd.read_csv之前。这是该档案：

字母数字

AB 42

CD 77

AB 33

AB 42

AB 33

CD 54

AB 33

修改：要检查问题是否与数据无关，您可以运行此代码：

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

df = pd.DataFrame(np.round(np.random.randn(10, 2)),
                 columns=['a', 'b'])

df.hist()
plt.show()

来自未排序数据的Python直方图

4 个答案: