la有以下示例(仅用于解释):
Real_value Predicted_values
hello halo
communication commanecetpo
what waht
is is
up down
neural narel
network natwark
computer computer
vision vison
convolutional conventioanl
hebbian hebien
learing larnig
transfer trasfert
第一列表示实际值,第二列表示预测值。我想比较每行的列值来检测两个字符串的不同之处
我做了以下事情:
ifor i in range(len(df)):
if df.manual_raw_value[i] != df.raw_value[i]:
text=df.manual_raw_value[i]
text2=df.raw_value[i]
x=len(df.manual_raw_value[i])
y = len(df.raw_value[i])
z=min(x,y)
for t in range(z):
if text[t] != text2[t]:
d= (text[t],text2[t])
dictionnary.append(d)
print(dictionnary)
[ ('a', 'n'),
('n', 'g'),
('g', 'e'),
('e', '.'),
('.', 'f'),
('f', 'r'),
("'", 'E'),
('E', 'S'),
('S', 'C'),
('C', 'O'),
('O', 'M'),
('M', 'P'),
('P', 'T'),
('T', 'E'),
('C', 'Q'),
('6', 'G'),
('9', 'o'),
('1', 'i'),
("'", 'E'),
('E', 'a'),
('a', 'u'),
('.', ','),
...]
字典中的键代表真正的价值。 现在我想计算出现次数如下:
[('a' : 'e'), ('a','e'), ('b','d')]
变为
[('a' : 'e') : 2, ('b','d') : 1]
我尝试过:
collections.Counter(dictionnary)
[ ('/', '1'): 2,
('/', 'M'): 2,
('/', 'W'): 2,
('/', 'h'): 8,
('/', 'm'): 2,
('/', 't'): 6,
('0', '-'): 2,
('0', '1'): 2,
('0', '3'): 2,
('0', '4'): 6,
('0', '5'): 2,
('0', '6'): 2,
('0', '7'): 4,
('0', '9'): 2,
('0', 'C'): 2,
('0', 'D'): 4,
('0', 'O'): 16,
('0', 'Q'): 4,
('0', 'U'): 2,
('0', 'm'): 4,
('0', 'o'): 2,
('0', '\xc3'): 2,
('1', ' '): 2,
('1', '/'): 2,
('1', '0'): 4,
('1', '2'): 2,
('1', '3'): 2,
('1', '4'): 2,
('1', '6'): 2,
('1', 'H'): 2,
('1', 'I'): 24,
('1', 'S'): 2,
('1', 'i'): 6,
('1', 'l'): 6,
('2', '3'): 2,
('2', '8'): 2,
('2', 'N'): 2,
('2', 'S'): 2, ..]
绘制直方图l尝试了以下内容:
import numpy as np
import matplotlib.pyplot as plt
pos = np.arange(len(dictionnary.keys()))
width = 1.0
ax = plt.axes()
ax.set_xticks(pos + (width / 2))
ax.set_xticklabels(dictionnary.keys())
plt.bar(dictionary.keys(), ******, width, color='g')
plt.show()
但是:
dictionnary.keys()
返回以下错误:
Traceback (most recent call last):
File "/home/ahmed/anaconda3/envs/cv/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2881, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-94-5d466717162c>", line 1, in <module>
dictionnary_new.keys()
AttributeError: 'list' object has no attribute 'keys'
修改1:
dictionnary_new = collections.Counter(dictionnary) # it works
import numpy as np
import matplotlib.pyplot as plt
pos = np.arange(len(dictionnary_new.keys()))
width = 1.0
ax = plt.axes()
ax.set_xticks(pos + (width / 2))
ax.set_xticklabels(dictionnary_new.keys())
plt.bar(dictionnary_new.keys(), dictionnary_new.values(), width, color='g')
plt.show()
l got the following error :
Traceback (most recent call last):
File "/home/ahmed/anaconda3/envs/cv/lib/python2.7/site-
packages/IPython/core/interactiveshell.py", line 2881, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-117-4155944ddaf3>", line 11, in <module>
plt.bar(dictionnary_new.keys(), dictionnary_new.values(), width, color='g')
File "/home/ahmed/anaconda3/envs/cv/lib/python2.7/site-packages/matplotlib/pyplot.py", line 2705, in bar
**kwargs)
File "/home/ahmed/anaconda3/envs/cv/lib/python2.7/site-packages/matplotlib/__init__.py", line 1892, in inner
return func(ax, *args, **kwargs)
File "/home/ahmed/anaconda3/envs/cv/lib/python2.7/site-packages/matplotlib/axes/_axes.py", line 2105, in bar
left = [left[i] - width[i] / 2. for i in xrange(len(left))]
TypeError: unsupported operand type(s) for -: 'tuple' and 'float'
非常感谢
答案 0 :(得分:0)
首先,我认为你的成对例子中有一个拼写错误:
>>> lst = [{'a': 'e'}, {'a': 'e'}, {'b': 'd'}]
>>> collections.Counter([tuple(i.items()) for i in lst])
Counter({(('a', 'e'),): 2, (('b', 'd'),): 1})
话虽如此,我认为这不是解决这个问题的正确方法。在您的代码中,当您将内容附加到dictionary
变量时,请不要使用字典,请使用元组!替换:
d= {text[t] : text2[t]}
dictionnary.append(d)
使用:
d= (text[t], text2[t])
dictionnary.append(d)
然后你可以使用:
collections.Counter(dictionnary)
答案 1 :(得分:0)
这样的事情对你有用吗?
df['string diff'] = df.apply(lambda x: distance.levenshtein(x['Real Value'], x['Predicted Values']), axis=1)
plt.hist(df['string diff'])
plt.show()