Question

在这里，我试图用plot设置最常用的单词，但是我遇到了一个问题，因为该语言是阿拉伯语，并且与格式不匹配

fig, ax = plt.subplots(figsize=(12, 10))
sns.barplot(x="word", y="freq", data=word_counter_df, palette="PuBuGn_d", ax=ax)
plt.show();

我尝试使用ast解码，但与plot不匹配

import ast
fig, ax = plt.subplots(figsize=(12, 10))
sns.barplot(x="word", y="freq", data=word_counter_df.apply(ast.literal_eval).str.decode("utf-8"), palette="PuBuGn_d", ax=ax)
plt.show();

word_counter_df如下：

<class 'pandas.core.frame.DataFrame'>
      word  freq
0   الله    6829
1   علي     5636
2   ان      3732
3   اللهم   2575
4   انا     2436
5   صباح    2115
6   اللي    1792
7   الي     1709
8   والله   1645
9   الهلال  1520
10  الا     1394
11  الخير   1276
12  انت     1209
13  يارب    1089
14  يوم     1082
15  رتويت   1019
16  كان     1004
17  اذا     994 
18  لله     982 
19  اي      939

使用此错误返回空图：

ValueError ：（格式错误的节点或字符串：0الله\ n1علي\ n2   ان\ n3اللهم\ n4انا\ n5图片\ n6اللي\ n7
  الي\ n8壁纸\ n9الهلال\ n10الا\ n11الخير\ n12
  انت\ n13يارب\ n14يوم\ n15رتويت\ n16كان\ n17
  اذا\ n18لله\ n19اي\ n名称：单词，dtype：对象'，   “出现在索引词上”）

Answer 1

您可以使用熊猫内置的plot.bar函数：

word_counter_df.plot.bar(x="word", y="freq")
plt.show()

更新所连接的阿拉伯字母

import arabic_reshaper
from bidi.algorithm import get_display
word_counter_df['disp'] = word_counter_df.word.apply(arabic_reshaper.reshape).apply(get_display)
word_counter_df.plot.bar(x="disp", y="freq")

seaborn（版本0.9.0）here也是如此。

绘制最多20个常用单词

1 个答案: