如何在记事本文件中的数据中绘制python中的直方图?

时间:2017-10-02 13:44:33

标签: python matplotlib

我需要根据记事本文件中的一些数据在python中绘制直方图。 我的记事本文件包含10000行,每行有10个假设编号,从0到255:

....
....
[205 246  19  68 118  44  45  72 210 162]
[205 246  19  68 118  44  45  72 210 162]
[205 246  19  68 118  44  45  72 210 162]
[205 246  19  68 118  44  72  45 210 162]
[205 246  19  68 118  44  45  72 210 162]
[246 205  19  68 118  44  72  45 210 162]
[205 246  19  68 118  44  72  45 210 162]
[205 246  19  68 118  44  72  45 210 162]
[205 246  19  68 118  44  72  45 210 162]
[205 246  19  68 118  44  72  45 210 162]
[205 246  19  68 118  44  72  45 210 162]
[205 246  19  68 118  44  72  45 210 162]
[205 246  19  68 118  44  72  45 210 162]
[205 246  19  68 118  44  72  45 210 162]
[205 246  19 118  44  68  72  45 210 162]
[205 246  19 118  68  44  72  45 210 162]
[205 246  19 118  68  44  72  45 210 162]
[205 246  19 118  68  44  72  45 210 162]
[205 246  19 118  68  44  72  45 210 162]

所以我的目标是取最后一行,然后检查所有记事本文件中每个数字重复的次数。

例如,这是我的最后一行[205 246 19 118 68 44 72 45 210 162]。我需要根据所有文件中每个数字的重复次数来绘制直方图。 我需要提取它的排名:

import matplotlib.pyplot as plt
import numpy as np
fileHandle = open('path_File',"rb" )
lineList = fileHandle.readlines()
fileHandle.close()
print (lineList)
print ("The last line is:")
print (lineList[-1]) 

我从这段代码中提取了最后一行,但是我不能计算所有文件中每个数字的重复,如何根据它绘制直方图?

2 个答案:

答案 0 :(得分:0)

这里有一个数组,其中每个元素都是文件中的一行。 如果所有的行都以相同的方式进行格式化(似乎是这样),你可以遍历所有行并使用计数器。

import matplotlib.pyplot as plt
import numpy as np
fileHandle = open('path_File',"rb" )
lineList = fileHandle.readlines()
fileHandle.close()
print (lineList)
print ("The last line is:")
print (lineList[-1]) 
count = 0
# The [:-1] says that you take all the values but the last one
for line in lineList[:-1]:
    if line == lineList[-1]:
        count += 1

如果您要检查最后一行中的每个数字,重复它们的次数,则需要拆分行。您可以在字符串上使用split函数。要小心,因为每行都有括号,只需删除第一个和最后一个字符:

last_line = lineList[-1][1:-1].split(" ")
# This means, I want to split the last item of lineList
# with the space character " " as a separator. Also, i don't
# want the first and last character ([1:-1])

然后在循环中执行相同的操作:

# Initialize an array of counters for each element in last_line
counters = [0] * len(last_line)
for line in lineList[:-1]:
    line = line[1:-1].split(" ")
    for i in range(len(last_line)):
        if line[i] == last_line[i]:
            counters[i] += 1

然后,如果您想绘制直方图,请查看:https://matplotlib.org/devdocs/gallery/pyplots/pyplot_text.html#sphx-glr-gallery-pyplots-pyplot-text-py

https://matplotlib.org/devdocs/api/_as_gen/matplotlib.pyplot.hist.html#matplotlib.pyplot.hist

答案 1 :(得分:0)

以下是使用pandas库的示例:

import StringIO #python3: io
import pandas as pd
import matplotlib.pyplot as plt

string = """[205 246  19  68 118  44  45  72 210 162]
[205 246  19  68 118  44  45  72 210 162]
[205 246  19  68 118  44  45  72 210 162]
[205 246  19  68 118  44  72  45 210 162]
[205 246  19  68 118  44  45  72 210 162]
[246 205  19  68 118  44  72  45 210 162]
[205 246  19  68 118  44  72  45 210 162]
[205 246  19  68 118  44  72  45 210 162]
[205 246  19  68 118  44  72  45 210 162]
[205 246  19  68 118  44  72  45 210 162]
[205 246  19  68 118  44  72  45 210 162]
[205 246  19  68 118  44  72  45 210 162]
[205 246  19  68 118  44  72  45 210 162]
[205 246  19  68 118  44  72  45 210 162]
[205 246  19 118  44  68  72  45 210 162]
[205 246  19 118  68  44  72  45 210 162]
[205 246  19 118  68  44  72  45 210 162]
[205 246  19 118  68  44  72  45 210 162]
[205 246  19 118  68  44  72  45 210 162]"""

# Here we clean the file from [] and spaces " " creating a generator
clean = (i.strip()[1:-1].split() for i in StringIO.StringIO(string)) #py3 io.String...()

# But this code here is what you want to comment out and modify 
#with open("path/to/file.txt") as f:
#    clean = (i.strip()[1:-1].split() for i in f.readlines())

# Create the dataframe
df = pd.DataFrame(clean)

# Counts all items and put them in a dict
dict_count = df.apply(pd.value_counts).sum(axis=1).to_dict()

# Dict with last row count (based on dict_count)
dict_values = {i:dict_count[i] for i in df.tail(1).values[0].tolist()}

# Plot a bar?
# https://stackoverflow.com/questions/16010869/python-plot-a-bar-using-matplotlib-using-a-dictionary
plt.bar(range(len(dict_values)), dict_values.values(), align='center')
plt.xticks(range(len(dict_values)), dict_values.keys())

plt.show()

enter image description here