我需要绘制一个图表(我不确定,如果它有一个特定的名称:一些称之为(词汇)" dispersion" -plot(NLTK),其他& #34;条形码"(matplotlib))。我有一个文本,我用文字分割,我希望每次文字出现时图形都画一条细线。我想用 python3 中的 matplotlib 来做这件事。 (帖子" Lexical dispersion plot is seaborn"与我的问题非常相似,但它使用来自 seaborn 的stripplot
,但我想用 matplotlib 。)
我写了一段代码,但它需要一段难以置信的时间来绘制它。我的问题是,如何改进此代码或如何使其正确。这是一个MWE:
import matplotlib.pyplot as plt
text = open("file.txt", "r", encoding="utf-8").read()
words = re.split("\W", text.lower())
WORD = "rabbit"
x = [i for i in range(0,len(words))]
y = [1 if w == WORD else 0 for w in words]
fig, ax = plt.subplots()
ax.bar(x, y, width=0, edgecolor="red")
ax.set_xticks([])
ax.set_yticks([])
答案 0 :(得分:0)
基于@ImportanceOfBeingErnest的评论我发布了一个MWE,它比问题中发布的代码工作得快得多。
import matplotlib.pyplot as plt
import re
#text = open("file.txt", "r", encoding="utf-8").read()
text="""There was nothing so very remarkable in that;
nor did Alice think it so very much out of the way
to hear the Rabbit say to itself, Oh dear! Oh dear!
I shall be too late! …; but when the Rabbit actually
took a watch out of its waistcoat-pocket …"""
words = re.split("\W", text.lower()) # split into words
words = [w for w in words if w != ""] # remove empty elements
WORD = "rabbit" # define word to search for
x=list()
for i in range(0,len(words)): # for every word in text
if words[i] == WORD: # check if word is word we are searching for
x.append(i) # if so, append its position to variable x
fig, ax = plt.subplots()
ax.vlines(x, 0, 1, edgecolor="red") # <-- ANSWER
ax.set_xlim([0, len(words)]) # set the lower and upper limits of graph
ax.set_xlabel('narrative time')
ax.set_xticks([0],minor=True) # turn off: ax.set_xticks([])
ax.set_ylabel(WORD) # turn off by droping this line
ax.set_yticks([])
fig.set_figheight(1) # figure height, see also fig.set_figwidth()
一个可能的争论,比同样的&#39;更快。输出plt.bar()
,可能是条形图在绘制时有更多属性要考虑(参见@ImportanceOfBeingErnest的评论)。