与matplotlib的词汇分散图

时间:2017-08-19 20:29:07

标签: python matplotlib plot categorical-data

我需要绘制一个图表(我不确定,如果它有一个特定的名称:一些称之为(词汇)" dispersion" -plot(NLTK),其他& #34;条形码"(matplotlib))。我有一个文本,我用文字分割,我希望每次文字出现时图形都画一条细线。我想用 python3 中的 matplotlib 来做这件事。 (帖子" Lexical dispersion plot is seaborn"与我的问题非常相似,但它使用来自 seaborn stripplot,但我想用 matplotlib 。)

我写了一段代码,但它需要一段难以置信的时间来绘制它。我的问题是,如何改进此代码或如何使其正确。这是一个MWE:

import matplotlib.pyplot as plt
text = open("file.txt", "r", encoding="utf-8").read()
words = re.split("\W", text.lower())
WORD = "rabbit"
x = [i for i in range(0,len(words))]
y = [1 if w == WORD else 0 for w in words]
fig, ax = plt.subplots()
ax.bar(x, y, width=0, edgecolor="red")
ax.set_xticks([])
ax.set_yticks([])

example dispersion- or barcode-plot

1 个答案:

答案 0 :(得分:0)

基于@ImportanceOfBeingErnest的评论我发布了一个MWE,它比问题中发布的代码工作得快得多。

import matplotlib.pyplot as plt
import re
#text = open("file.txt", "r", encoding="utf-8").read()
text="""There was nothing so very remarkable in that;
        nor did Alice think it so very much out of the way
        to hear the Rabbit say to itself, Oh dear! Oh dear!
        I shall be too late! …; but when the Rabbit actually
        took a watch out of its waistcoat-pocket …"""
words = re.split("\W", text.lower()) # split into words
words = [w for w in words if w != ""] # remove empty elements
WORD = "rabbit" # define word to search for

x=list()
for i in range(0,len(words)): # for every word in text
    if words[i] == WORD: # check if word is word we are searching for
        x.append(i) # if so, append its position to variable x

fig, ax = plt.subplots()
ax.vlines(x, 0, 1, edgecolor="red") # <-- ANSWER
ax.set_xlim([0, len(words)]) # set the lower and upper limits of graph
ax.set_xlabel('narrative time')
ax.set_xticks([0],minor=True) # turn off: ax.set_xticks([])
ax.set_ylabel(WORD) # turn off by droping this line
ax.set_yticks([])
fig.set_figheight(1) # figure height, see also fig.set_figwidth()

tokens dispersion, example plot

一个可能的争论,比同样的&#39;更快。输出plt.bar(),可能是条形图在绘制时有更多属性要考虑(参见@ImportanceOfBeingErnest的评论)。