我需要根据记事本文件中的一些数据在python中绘制直方图。 我的记事本文件包含10000行,每行有10个假设编号,从0到255:
....
....
[205 246 19 68 118 44 45 72 210 162]
[205 246 19 68 118 44 45 72 210 162]
[205 246 19 68 118 44 45 72 210 162]
[205 246 19 68 118 44 72 45 210 162]
[205 246 19 68 118 44 45 72 210 162]
[246 205 19 68 118 44 72 45 210 162]
[205 246 19 68 118 44 72 45 210 162]
[205 246 19 68 118 44 72 45 210 162]
[205 246 19 68 118 44 72 45 210 162]
[205 246 19 68 118 44 72 45 210 162]
[205 246 19 68 118 44 72 45 210 162]
[205 246 19 68 118 44 72 45 210 162]
[205 246 19 68 118 44 72 45 210 162]
[205 246 19 68 118 44 72 45 210 162]
[205 246 19 118 44 68 72 45 210 162]
[205 246 19 118 68 44 72 45 210 162]
[205 246 19 118 68 44 72 45 210 162]
[205 246 19 118 68 44 72 45 210 162]
[205 246 19 118 68 44 72 45 210 162]
所以我的目标是取最后一行,然后检查所有记事本文件中每个数字重复的次数。
例如,这是我的最后一行[205 246 19 118 68 44 72 45 210 162]
。我需要根据所有文件中每个数字的重复次数来绘制直方图。
我需要提取它的排名:
import matplotlib.pyplot as plt
import numpy as np
fileHandle = open('path_File',"rb" )
lineList = fileHandle.readlines()
fileHandle.close()
print (lineList)
print ("The last line is:")
print (lineList[-1])
我从这段代码中提取了最后一行,但是我不能计算所有文件中每个数字的重复,如何根据它绘制直方图?
答案 0 :(得分:0)
这里有一个数组,其中每个元素都是文件中的一行。 如果所有的行都以相同的方式进行格式化(似乎是这样),你可以遍历所有行并使用计数器。
import matplotlib.pyplot as plt
import numpy as np
fileHandle = open('path_File',"rb" )
lineList = fileHandle.readlines()
fileHandle.close()
print (lineList)
print ("The last line is:")
print (lineList[-1])
count = 0
# The [:-1] says that you take all the values but the last one
for line in lineList[:-1]:
if line == lineList[-1]:
count += 1
如果您要检查最后一行中的每个数字,重复它们的次数,则需要拆分行。您可以在字符串上使用split
函数。要小心,因为每行都有括号,只需删除第一个和最后一个字符:
last_line = lineList[-1][1:-1].split(" ")
# This means, I want to split the last item of lineList
# with the space character " " as a separator. Also, i don't
# want the first and last character ([1:-1])
然后在循环中执行相同的操作:
# Initialize an array of counters for each element in last_line
counters = [0] * len(last_line)
for line in lineList[:-1]:
line = line[1:-1].split(" ")
for i in range(len(last_line)):
if line[i] == last_line[i]:
counters[i] += 1
然后,如果您想绘制直方图,请查看:https://matplotlib.org/devdocs/gallery/pyplots/pyplot_text.html#sphx-glr-gallery-pyplots-pyplot-text-py
https://matplotlib.org/devdocs/api/_as_gen/matplotlib.pyplot.hist.html#matplotlib.pyplot.hist
答案 1 :(得分:0)
以下是使用pandas库的示例:
import StringIO #python3: io
import pandas as pd
import matplotlib.pyplot as plt
string = """[205 246 19 68 118 44 45 72 210 162]
[205 246 19 68 118 44 45 72 210 162]
[205 246 19 68 118 44 45 72 210 162]
[205 246 19 68 118 44 72 45 210 162]
[205 246 19 68 118 44 45 72 210 162]
[246 205 19 68 118 44 72 45 210 162]
[205 246 19 68 118 44 72 45 210 162]
[205 246 19 68 118 44 72 45 210 162]
[205 246 19 68 118 44 72 45 210 162]
[205 246 19 68 118 44 72 45 210 162]
[205 246 19 68 118 44 72 45 210 162]
[205 246 19 68 118 44 72 45 210 162]
[205 246 19 68 118 44 72 45 210 162]
[205 246 19 68 118 44 72 45 210 162]
[205 246 19 118 44 68 72 45 210 162]
[205 246 19 118 68 44 72 45 210 162]
[205 246 19 118 68 44 72 45 210 162]
[205 246 19 118 68 44 72 45 210 162]
[205 246 19 118 68 44 72 45 210 162]"""
# Here we clean the file from [] and spaces " " creating a generator
clean = (i.strip()[1:-1].split() for i in StringIO.StringIO(string)) #py3 io.String...()
# But this code here is what you want to comment out and modify
#with open("path/to/file.txt") as f:
# clean = (i.strip()[1:-1].split() for i in f.readlines())
# Create the dataframe
df = pd.DataFrame(clean)
# Counts all items and put them in a dict
dict_count = df.apply(pd.value_counts).sum(axis=1).to_dict()
# Dict with last row count (based on dict_count)
dict_values = {i:dict_count[i] for i in df.tail(1).values[0].tolist()}
# Plot a bar?
# https://stackoverflow.com/questions/16010869/python-plot-a-bar-using-matplotlib-using-a-dictionary
plt.bar(range(len(dict_values)), dict_values.values(), align='center')
plt.xticks(range(len(dict_values)), dict_values.keys())
plt.show()