我有以下代码试图打印图表的边缘列表。看起来边缘是循环的,但是我打算在通过函数进行进一步处理时测试是否包含所有边。
def mapper_network(self, _, info):
info[0] = info[0].encode('utf-8')
for i in range(len(info[1])):
info[1][i] = str(info[1][i])
l_lst = len(info[1])
packed = [(info[0], l) for l in info[1]] #each pair of nodes (edge)
weight = [1 /float(l_lst)] #each edge weight
G = nx.Graph()
for i in range(len(packed)):
edge_from = packed[i][0]
edge_to = packed[i][1]
#edge_to = unicodedata.normalize("NFKD", edge_to).encode('utf-8', 'ignore')
edge_to = edge_to.encode("utf-8")
weight = weight
G.add_edge(edge_from, edge_to, weight=weight)
#print G.size() #yes, this works :)
G_edgelist = []
G_edgelist = G_edgelist.append(nx.generate_edgelist(G).next())
print G_edgelist
使用此代码,我获得错误
Traceback (most recent call last):
File "MRQ7_trevor_2.py", line 160, in <module>
MRMostUsedWord2.run()
File "/tmp/MRQ7_trevor_2.vagrant.20160814.201259.655269/job_local_dir/1/mapper/27/mrjob.tar.gz/mrjob/job.py", line 433, in run
mr_job.execute()
File "/tmp/MRQ7_trevor_2.vagrant.20160814.201259.655269/job_local_dir/1/mapper/27/mrjob.tar.gz/mrjob/job.py", line 442, in execute
self.run_mapper(self.options.step_num)
File "/tmp/MRQ7_trevor_2.vagrant.20160814.201259.655269/job_local_dir/1/mapper/27/mrjob.tar.gz/mrjob/job.py", line 507, in run_mapper
for out_key, out_value in mapper(key, value) or ():
File "MRQ7_trevor_2.py", line 91, in mapper_network
G_edgelist = G_edgelist.append(nx.generate_edgelist(G).next())
File "/home/vagrant/anaconda/lib/python2.7/site-packages/networkx/readwrite/edgelist.py", line 114, in generate_edgelist
yield delimiter.join(map(make_str,e))
File "/home/vagrant/anaconda/lib/python2.7/site-packages/networkx/utils/misc.py", line 82, in make_str
return unicode(str(x), 'unicode-escape')
UnicodeDecodeError: 'unicodeescape' codec can't decode byte 0x5c in position 0: \ at end of string
通过以下修改
edge_to = unicodedata.normalize("NFKD", edge_to).encode('utf-8', 'ignore')
我获得了
edge_to = unicodedata.normalize("NFKD", edge_to).encode('utf-8', 'ignore')
TypeError: must be unicode, not str
如何摆脱unicode的错误?这看起来很麻烦,我非常感谢你的帮助。谢谢!!
答案 0 :(得分:0)
我强烈建议您阅读此article on unicode。它给出了Python 2中unicode与字符串的一个很好的解释。
特别针对您的问题,当您致电unicodedata.normalize("NFKD", edge_to)
时,edge_to
必须是unicode字符串。但是,它不是unicode,因为您在此行中设置它:info[1][i] = str(info[1][i])
。这是一个快速测试:
import unicodedata
edge_to = u'edge' # this is unicode
edge_to = unicodedata.normalize("NFKD", edge_to).encode('utf-8', 'ignore')
print edge_to # prints 'edge' as expected
edge_to = 'edge' # this is not unicode
edge_to = unicodedata.normalize("NFKD", edge_to).encode('utf-8', 'ignore')
print edge_to # TypeError: must be unicode, not str
您可以通过将edge_to
转换为unicode来解决问题。
顺便说一下,似乎整个代码块的编码/解码有点令人困惑。仔细想想你想要字符串unicode和字节的位置。您可能不需要进行如此多的编码/解码/规范化。