获取并绘制两个字符串中非相等字符数的直方图

时间:2017-06-21 15:44:08

标签: python pandas dictionary histogram counter

例如,

la有以下示例(仅用于解释):

Real_value     Predicted_values
hello            halo
communication    commanecetpo
what             waht
is               is
up               down
neural           narel
network          natwark
computer         computer
vision           vison
convolutional    conventioanl
hebbian          hebien
learing          larnig
transfer         trasfert

第一列表示实际值,第二列表示预测值。我想比较每行的列值来检测两个字符串的不同之处

我做了以下事情:

ifor i in range(len(df)):
if df.manual_raw_value[i] != df.raw_value[i]:
    text=df.manual_raw_value[i]
    text2=df.raw_value[i]
    x=len(df.manual_raw_value[i])
    y = len(df.raw_value[i])
    z=min(x,y)
    for t in range(z):
        if text[t] != text2[t]:
            d= (text[t],text2[t])
            dictionnary.append(d) 
            print(dictionnary)


 [  ('a', 'n'),
 ('n', 'g'),
 ('g', 'e'),
 ('e', '.'),
 ('.', 'f'),
 ('f', 'r'),
 ("'", 'E'),
 ('E', 'S'),
 ('S', 'C'),
 ('C', 'O'),
 ('O', 'M'),
 ('M', 'P'),
 ('P', 'T'),
 ('T', 'E'),
 ('C', 'Q'),
 ('6', 'G'),
 ('9', 'o'),
 ('1', 'i'),
 ("'", 'E'),
 ('E', 'a'),
 ('a', 'u'),
 ('.', ','),
 ...]

字典中的键代表真正的价值。 现在我想计算出现次数如下:

[('a' : 'e'), ('a','e'), ('b','d')]

变为

[('a' : 'e') : 2,  ('b','d') : 1] 
我尝试过:

  collections.Counter(dictionnary)
   [ ('/', '1'): 2,
         ('/', 'M'): 2,
         ('/', 'W'): 2,
         ('/', 'h'): 8,
         ('/', 'm'): 2,
         ('/', 't'): 6,
         ('0', '-'): 2,
         ('0', '1'): 2,
         ('0', '3'): 2,
         ('0', '4'): 6,
         ('0', '5'): 2,
         ('0', '6'): 2,
         ('0', '7'): 4,
         ('0', '9'): 2,
         ('0', 'C'): 2,
         ('0', 'D'): 4,
         ('0', 'O'): 16,
         ('0', 'Q'): 4,
         ('0', 'U'): 2,
         ('0', 'm'): 4,
         ('0', 'o'): 2,
         ('0', '\xc3'): 2,
         ('1', ' '): 2,
         ('1', '/'): 2,
         ('1', '0'): 4,
         ('1', '2'): 2,
         ('1', '3'): 2,
         ('1', '4'): 2,
         ('1', '6'): 2,
         ('1', 'H'): 2,
         ('1', 'I'): 24,
         ('1', 'S'): 2,
         ('1', 'i'): 6,
         ('1', 'l'): 6,
         ('2', '3'): 2,
         ('2', '8'): 2,
         ('2', 'N'): 2,
         ('2', 'S'): 2, ..]

绘制直方图l尝试了以下内容:

import numpy as np
import matplotlib.pyplot as plt

pos = np.arange(len(dictionnary.keys()))
width = 1.0    

ax = plt.axes()
ax.set_xticks(pos + (width / 2))
ax.set_xticklabels(dictionnary.keys())

plt.bar(dictionary.keys(), ******, width, color='g')
plt.show()

但是: dictionnary.keys()返回以下错误:

Traceback (most recent call last):
  File "/home/ahmed/anaconda3/envs/cv/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2881, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-94-5d466717162c>", line 1, in <module>
    dictionnary_new.keys()
AttributeError: 'list' object has no attribute 'keys'

修改1:

 dictionnary_new = collections.Counter(dictionnary) # it works
    import numpy as np
    import matplotlib.pyplot as plt

    pos = np.arange(len(dictionnary_new.keys()))
    width = 1.0    

    ax = plt.axes()
    ax.set_xticks(pos + (width / 2))
    ax.set_xticklabels(dictionnary_new.keys())

    plt.bar(dictionnary_new.keys(), dictionnary_new.values(), width, color='g')
    plt.show()


l got the following error :


        Traceback (most recent call last):
          File "/home/ahmed/anaconda3/envs/cv/lib/python2.7/site-

packages/IPython/core/interactiveshell.py", line 2881, in run_code
        exec(code_obj, self.user_global_ns, self.user_ns)
      File "<ipython-input-117-4155944ddaf3>", line 11, in <module>
        plt.bar(dictionnary_new.keys(), dictionnary_new.values(), width, color='g')
      File "/home/ahmed/anaconda3/envs/cv/lib/python2.7/site-packages/matplotlib/pyplot.py", line 2705, in bar
        **kwargs)
      File "/home/ahmed/anaconda3/envs/cv/lib/python2.7/site-packages/matplotlib/__init__.py", line 1892, in inner
        return func(ax, *args, **kwargs)
      File "/home/ahmed/anaconda3/envs/cv/lib/python2.7/site-packages/matplotlib/axes/_axes.py", line 2105, in bar
        left = [left[i] - width[i] / 2. for i in xrange(len(left))]
    TypeError: unsupported operand type(s) for -: 'tuple' and 'float'

非常感谢

2 个答案:

答案 0 :(得分:0)

首先,我认为你的成对例子中有一个拼写错误:

>>> lst = [{'a': 'e'}, {'a': 'e'}, {'b': 'd'}]
>>> collections.Counter([tuple(i.items()) for i in lst])
Counter({(('a', 'e'),): 2, (('b', 'd'),): 1})

话虽如此,我认为这不是解决这个问题的正确方法。在您的代码中,当您将内容附加到dictionary变量时,请不要使用字典,请使用元组!替换:

d= {text[t] : text2[t]}
dictionnary.append(d)

使用:

d= (text[t], text2[t])
dictionnary.append(d)

然后你可以使用:

collections.Counter(dictionnary)

答案 1 :(得分:0)

这样的事情对你有用吗?

df['string diff'] = df.apply(lambda x: distance.levenshtein(x['Real Value'], x['Predicted Values']), axis=1)
plt.hist(df['string diff'])
plt.show()