Question

我需要绘制一个图表（我不确定，如果它有一个特定的名称：一些称之为（词汇）＆＃34; dispersion＆＃34; -plot（NLTK），其他＆＃34;条形码＆＃34;（matplotlib））。我有一个文本，我用文字分割，我希望每次文字出现时图形都画一条细线。我想用 python3 中的 matplotlib 来做这件事。（帖子＆＃34; Lexical dispersion plot is seaborn＆＃34;与我的问题非常相似，但它使用来自 seaborn 的stripplot，但我想用 matplotlib 。）

我写了一段代码，但它需要一段难以置信的时间来绘制它。我的问题是，如何改进此代码或如何使其正确。这是一个MWE：

import matplotlib.pyplot as plt
text = open("file.txt", "r", encoding="utf-8").read()
words = re.split("\W", text.lower())
WORD = "rabbit"
x = [i for i in range(0,len(words))]
y = [1 if w == WORD else 0 for w in words]
fig, ax = plt.subplots()
ax.bar(x, y, width=0, edgecolor="red")
ax.set_xticks([])
ax.set_yticks([])

Answer 1

基于@ImportanceOfBeingErnest的评论我发布了一个MWE，它比问题中发布的代码工作得快得多。

import matplotlib.pyplot as plt
import re
#text = open("file.txt", "r", encoding="utf-8").read()
text="""There was nothing so very remarkable in that;
        nor did Alice think it so very much out of the way
        to hear the Rabbit say to itself, Oh dear! Oh dear!
        I shall be too late! …; but when the Rabbit actually
        took a watch out of its waistcoat-pocket …"""
words = re.split("\W", text.lower()) # split into words
words = [w for w in words if w != ""] # remove empty elements
WORD = "rabbit" # define word to search for

x=list()
for i in range(0,len(words)): # for every word in text
    if words[i] == WORD: # check if word is word we are searching for
        x.append(i) # if so, append its position to variable x

fig, ax = plt.subplots()
ax.vlines(x, 0, 1, edgecolor="red") # <-- ANSWER
ax.set_xlim([0, len(words)]) # set the lower and upper limits of graph
ax.set_xlabel('narrative time')
ax.set_xticks([0],minor=True) # turn off: ax.set_xticks([])
ax.set_ylabel(WORD) # turn off by droping this line
ax.set_yticks([])
fig.set_figheight(1) # figure height, see also fig.set_figwidth()

一个可能的争论，比同样的＆＃39;更快。输出plt.bar()，可能是条形图在绘制时有更多属性要考虑（参见@ImportanceOfBeingErnest的评论）。

与matplotlib的词汇分散图

1 个答案: