在gml文件中写入时出现编码错误

时间:2013-10-02 13:37:13

标签: python-2.7 encoding networkx

在我以前的一篇文章中,我遇到了阅读和编写与英语不同的语言的字符串的问题。问题在于我的系统编码。 ton1c 提到在txt中编写字符串很好,确实是这样!现在我试图在gml文件中传递这些字符串,我再次遇到编码问题。这是代码和结果。

import urllib2
import BeautifulSoup
import networkx as nx

url = 'http://www.bbc.co.uk/zhongwen/simp/'

page = urllib2.urlopen(url).read().decode("utf-8")
dom =  BeautifulSoup.BeautifulSoup(page)

data = dom.findAll('meta', {'name' : 'keywords'})
data = data.encode("utf-8")
datalist = data.split(',')

G = nx.Graph()
G.add_node( "name", Strings = datalist );

返回

File "C:\...\name.py", line 23, in <module> nx.write_gml(G, 'Gname')
File "<string>", line 2, in write_gml
File "C:\Python27\lib\site-packages\networkx\utils\decorators.py", line 263, in _open_file
   result = func(*new_args, **kwargs)
File "C:\Python27\lib\site-packages\networkx\readwrite\gml.py", line 392, in write_gml
   path.write(line.encode('latin-1'))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 13: ordinal not in range(128)

有什么建议吗?我还想提一下,在networkx的网站中提到 GML规范表明该文件应该只使用7位ASCII文本编码.iso8859-1(latin-1)。http://networkx.lanl.gov/reference/generated/networkx.readwrite.gml.write_gml.html

PS:请在Python 2.7兼容性方面提出任何建议。

1 个答案:

答案 0 :(得分:1)

您只需执行以下操作:

import urllib2
import BeautifulSoup
import networkx as nx

url = 'http://www.bbc.co.uk/zhongwen/simp/'

page = urllib2.urlopen(url).read().decode("latin-1")
dom =  BeautifulSoup.BeautifulSoup(page)

data = dom.findAll('meta', {'name' : 'keywords'})
data = data[0]['content'].encode("latin-1")
#datalist = data.split(',')

with open("tags.txt", "w") as text_file:
    text_file.write("%s"%data)

G = nx.Graph()
G.add_node( "name", Strings = data.decode("latin-1") );
nx.write_gml(G,"test.gml")

graph [
  node [
    id 0
    label "name"
    Strings "BBC中文网,主页,国际新闻,中国新闻,台湾新闻,香港新闻,英国新闻,信息,财经,科技,卫生 互动,多媒体,视频,音频,图辑,bbcchinese.com, homepage, world news, China news, uk news, hong kong, taiwan, sci-tech, business, interactive, forum"
  ]
]