读取networkx GML时出现UnicodeDecodeError

时间:2017-08-26 11:33:58

标签: python python-2.7 unicode networkx gml

我正在尝试使用

读取GML文件
nx.read_gml('test.gml')

我查看了networkx read_gml()文档。据说 GML规范说文件应该是ASCII编码的,所以当我写GML时,我使用以下

reload(sys)
sys.setdefaultencoding('ascii')    
nx.write_gml(g, fname + '.gml')

test.gml的内容如下。

graph [
  name "Country-based Relationships Graph"
  node [
    id 0
    label "1-14014874"
    ned 0
    ntype 1
    name "3A/63 KIRRIBILLI AVENUE; KIRRIBILLI; AUSTRALIA"
  ]
  node [
    id 1
    label "2-12097019"
    name "ANTHONY WADDELL LATIMER"
    ned 0
    ntype 2
  ]
  node [
    id 2
    label "2-12201665"
    name "QUEENSLAND M M PTY LTD"
    ned 0
    ntype 2
  ]
  node [
    id 3
    label "1-14007784"
    ned 0
    ntype 1
    name "2/15 MOSMAN STREET MOSMAN 2088"
  ]
  node [
    id 4
    label "1-14007787"
    ned 0
    ntype 1
    name "2/19 SEABREEZE PLACE THIRROUL NSW 2515"
  ]
  node [
    id 5
    label "4-10124385"
    name "SOUTH AMERICAN FERRO METALS LIMITED"
    ned 0
    ntype 4
  ]
  node [
    id 6
    label "2-12100977"
    name "MARTIN AYLMER GREEN"
    ned 0
    ntype 2
  ]
  node [
    id 7
    label "1-14023939"
    ned 0
    ntype 1
    name "9/4 BILLYARD AVENUE; ELIZABETH BAY; AUSTRALIA"
  ]
  node [
    id 8
    label "1-14017022"
    ned 0
    ntype 1
    name ""47/228 MOORE PARKE ROAD, PADDINGTON""
  ]
  node [
    id 9
    label "2-12095303"
    name "GOLDFIND HOLDINGS PTY LTD"
    ned 0
    ntype 2
  ]
  node [
    id 10
    label "1-14019821"
    ned 0
    ntype 1
    name ""5 RISORTA AVENUE, ST IVES 2075 AUSTRALIA""
  ]
  node [
    id 11
    label "1-14001076"
    ned 0
    ntype 1
    name ""10B CONWAY AVENUE, ROSE BAY NSW 2028""
  ]
  node [
    id 12
    label "2-12195748"
    name "DAVID GRAHAM GRAY"
    ned 0
    ntype 2
  ]
  node [
    id 13
    label "2-12220072"
    name "GEORGE KOTEFSKI"
    ned 0
    ntype 2
  ]
  node [
    id 14
    label "2-12121150"
    name "SINO EUROPE INVESTMENTS LIMITED"
    ned 0
    ntype 2
  ]
  node [
    id 15
    label "2-12129794"
    name "STEPHEN JOHN TURNER"
    ned 1
    ntype 2
  ]
  node [
    id 16
    label "1-14003998"
    ned 0
    ntype 1
    name "149 Hudson Parade; Clareville; NSW 2107; Australia"
  ]
  node [
    id 17
    label "4-10110003"
    ned 0
    ntype 4
    name "BUNSWICK INVESTMENTS LIMITED"
  ]
  node [
    id 18
    label "1-14014446"
    ned 0
    ntype 1
    name ""373 EDINBURGH RD, CASTLECRAG NSW""
  ]
  node [
    id 19
    label "1-14024552"
    ned 0
    ntype 1
    name ""9 BARRACK STREET, SYDNEY NSW 2000""
  ]
  node [
    id 20
    label "1-14082647"
    name "UNIT 4/281 O'SULLIVAN ROAD BELLEVUE HILL NSW 2023"
    ned 0
    ntype 1
  ]
  node [
    id 21
    label "1-14002851"
    ned 0
    ntype 1
    name "12 CHERUB CLOSE; BALLUJARA; WA 6066 AUSTRALIA"
  ]
  node [
    id 22
    label "2-12220071"
    name "THE KOGOS FAMILY TRUST"
    ned 0
    ntype 2
  ]
  node [
    id 23
    label "2-12171442"
    name "MICHAEL JOHN DOYLE"
    ned 0
    ntype 2
  ]
  node [
    id 24
    label "2-12171441"
    name "GEORGINA TSOUTSOURAS"
    ned 0
    ntype 2
  ]
  node [
    id 25
    label "2-12171440"
    name "TERESA ODETTE VIEIRA GARCES"
    ned 0
    ntype 2
  ]
  node [
    id 26
    label "2-12171465"
    name "South American Ferro Metals Limited (Formerly “Riviera Resources Limited”"
    ned 0
    ntype 2
  ]
  node [
    id 27
    label "2-12171464"
    name "MP CAPITAL PTY LTD"
    ned 0
    ntype 2
  ]
  node [
    id 28
    label "2-12098018"
    name "Burton Securities Limited"
    ned 0
    ntype 2
  ]
  node [
    id 29
    label "2-12171461"
    name "PAUL BOURS"
    ned 0
    ntype 2
  ]
  node [
    id 30
    label "2-12129946"
    name "STEPHEN JOHN TURNER"
    ned 1
    ntype 2
  ]
  node [
    id 31
    label "2-12171463"
    name "BARRY ROBERT MCINNES"
    ned 0
    ntype 2
  ]
  node [
    id 32
    label "2-12171443"
    name "LORI MARGARET RAYNER"
    ned 0
    ntype 2
  ]
  node [
    id 33
    label "2-12171502"
    name "Afro Pacific Capital Pty Limited"
    ned 0
    ntype 2
  ]
  node [
    id 34
    label "2-12129947"
    name "STEPHEN JOHN TURNER"
    ned 1
    ntype 2
  ]
  node [
    id 35
    label "4-10116684"
    name "SUPER ALLOYS AFRICA LIMITED"
    ned 0
    ntype 4
  ]
  node [
    id 36
    label "4-10204113"
    ned 0
    ntype 4
    name "African Chrome Limited"
  ]
  node [
    id 37
    label "1-14002991"
    ned 0
    ntype 1
    name ""12 LAVONI STREET, BALMORAL BEACH 2088""
  ]
  node [
    id 38
    label "1-14063550"
    name "PO BOX 6215; SOUTH YARRA; VICTORIA; 3141 AUSTRALIA"
    ned 0
    ntype 1
  ]
  node [
    id 39
    label "1-14063476"
    ned 0
    ntype 1
    name "PO BOX 6009; QUEANBEYAN; N.S.W. 2620 AUSTRALIA"
  ]
  node [
    id 40
    label "2-12070744"
    name "GARRY JACK COHEN"
    ned 0
    ntype 2
  ]
  node [
    id 41
    label "2-13010180"
    name "ALAN DAVID DOYLE"
    ned 1
    ntype 2
  ]
  node [
    id 42
    label "1-14021460"
    ned 0
    ntype 1
    name "6 Wray Street; Batemans Bay; NSW 2536; Australia"
  ]
  node [
    id 43
    label "2-12117437"
    name "IAN DONALD PRATT"
    ned 0
    ntype 2
  ]
  node [
    id 44
    label "2-12110721"
    name "ALAN DAVID DOYLE"
    ned 0
    ntype 2
  ]
  node [
    id 45
    label "2-12108471"
    name "MEGAN BLACK"
    ned 0
    ntype 2
  ]
  node [
    id 46
    label "1-14023938"
    ned 0
    ntype 1
    name "9/4 BILLYARD AVENUE; ELIZABETH BAY; AUASTRALIA"
  ]
  node [
    id 47
    label "4-10204047"
    ned 0
    ntype 4
    name "ASIA ROCK HOLDINGS LTD."
  ]
  node [
    id 48
    label "2-12117663"
    name "JOHN LYSTER ABEL"
    ned 0
    ntype 2
  ]
  node [
    id 49
    label "2-12107354"
    name "W J K INVESTMENTS PTY LTD"
    ned 0
    ntype 2
  ]
  node [
    id 50
    label "2-12121288"
    name "Barry Robert Mcinnes Superannuation Fund"
    ned 0
    ntype 2
  ]
  node [
    id 51
    label "2-12115776"
    name "KELLY LEANNE BINDON"
    ned 0
    ntype 2
  ]
  node [
    id 52
    label "4-10128856"
    ned 0
    ntype 4
    name "ASD SERVICES LIMITED"
  ]
  node [
    id 53
    label "2-12196030"
    name "Patermat Pty Limited"
    ned 0
    ntype 2
  ]
  node [
    id 54
    label "1-14017203"
    ned 0
    ntype 1
    name "486 THE RIDGE ROAD; SURF BEACH; NSW 2536"
  ]
  node [
    id 55
    label "1-14013008"
    ned 0
    ntype 1
    name "30 JAPONICA AVE WEST EPPING 2121"
  ]
  node [
    id 56
    label "2-12219691"
    name "ALAN DAVID DOYLE"
    ned 1
    ntype 2
  ]
  node [
    id 57
    label "2-12171462"
    name "SUNTRONIC PTY LTD"
    ned 0
    ntype 2
  ]
  node [
    id 58
    label "2-12128104"
    name "BANYAN PROPERTIES INC."
    ned 0
    ntype 2
  ]
  node [
    id 59
    label "1-14022758"
    ned 0
    ntype 1
    name "8/4 BILLYARD AVENUE; ELIZABETH BAY 2011; AUSTRALIA"
  ]
  node [
    id 60
    label "1-14048421"
    name "LEVEL 11; 151 MACQUARIE STREET; SYDNEY"
    ned 0
    ntype 1
  ]
  node [
    id 61
    label "1-14048420"
    ned 0
    ntype 1
    name "Level 11; 1511 Macquarie Street; Sydney NSW 2000; Australia"
  ]
  node [
    id 62
    label "1-14048422"
    ned 0
    ntype 1
    name "Level 11; 151 Macquarie Street; Sydney; NSW; 2000; Australia"
  ]
  node [
    id 63
    label "2-12198327"
    name "MALACHITE ENTERPRISES LIMITED"
    ned 0
    ntype 2
  ]
  node [
    id 64
    label "2-12198326"
    name "VICTORIAN SECURITIES NO 2 PTY LTD."
    ned 0
    ntype 2
  ]
  node [
    id 65
    label "2-12095211"
    name "TREVOR JONES"
    ned 0
    ntype 2
  ]
  node [
    id 66
    label "1-14001856"
    ned 0
    ntype 1
    name "11A BURTON STREET MOSMAN 2070"
  ]
  node [
    id 67
    label "1-14008805"
    ned 0
    ntype 1
    name "21 BRETT STREET KINGS LANGLEY NSW 2147"
  ]
  node [
    id 68
    label "4-10131080"
    ned 0
    ntype 4
    name "BRAZILIAN IRON LIMITED"
  ]
  node [
    id 69
    label "1-14064238"
    ned 0
    ntype 1
    name "PO Box N284 Grosvenor Place; Sydney NSW 1220; Australia"
  ]
  node [
    id 70
    label "2-12113688"
    name "GABRIELLE MARY JARVIS"
    ned 0
    ntype 2
  ]
  node [
    id 71
    label "2-12103291"
    name "DAVID ANTHONY BURROUGHS"
    ned 0
    ntype 2
  ]
  node [
    id 72
    label "1-14006789"
    ned 0
    ntype 1
    name ""19 COLLINS AVENUE, ROSE BAY NSW 2028""
  ]
  node [
    id 73
    label "1-14082636"
    name "UNIT 3A; 63-65 KIRRIBILLI AVENUE; KIRRIBILLI; NSW 2061; AUSTRALIA"
    ned 0
    ntype 1
  ]
  node [
    id 74
    label "1-14079412"
    ned 0
    ntype 1
    name "SUITE 2; 1233 HIGH STREET; ARMELALE VIC 3143 AUSTRALIA"
  ]
  node [
    id 75
    label "4-10109120"
    name "PACK-TECH INTERNATIONAL LICENSING PTY LIMITED"
    ned 0
    ntype 4
  ]
  edge [
    source 0
    target 44
    weight 1
  ]
  edge [
    source 1
    target 37
    weight 1
  ]
  edge [
    source 1
    target 5
    weight 2
  ]
  edge [
    source 2
    target 5
    weight 1
  ]
  edge [
    source 2
    target 74
    weight 1
  ]
  edge [
    source 3
    target 49
    weight 1
  ]
  edge [
    source 4
    target 71
    weight 1
  ]
  edge [
    source 5
    target 32
    weight 2
  ]
  edge [
    source 5
    target 6
    weight 2
  ]
  edge [
    source 5
    target 24
    weight 2
  ]
  edge [
    source 5
    target 12
    weight 2
  ]
  edge [
    source 5
    target 25
    weight 2
  ]
  edge [
    source 5
    target 14
    weight 1
  ]
  edge [
    source 5
    target 22
    weight 2
  ]
  edge [
    source 5
    target 23
    weight 2
  ]
  edge [
    source 5
    target 9
    weight 2
  ]
  edge [
    source 5
    target 13
    weight 1
  ]
  edge [
    source 5
    target 26
    weight 2
  ]
  edge [
    source 5
    target 27
    weight 1
  ]
  edge [
    source 5
    target 28
    weight 1
  ]
  edge [
    source 5
    target 29
    weight 1
  ]
  edge [
    source 5
    target 30
    weight 1
  ]
  edge [
    source 5
    target 31
    weight 2
  ]
  edge [
    source 5
    target 45
    weight 1
  ]
  edge [
    source 5
    target 33
    weight 1
  ]
  edge [
    source 5
    target 40
    weight 1
  ]
  edge [
    source 5
    target 58
    weight 1
  ]
  edge [
    source 5
    target 43
    weight 1
  ]
  edge [
    source 5
    target 44
    weight 2
  ]
  edge [
    source 5
    target 57
    weight 1
  ]
  edge [
    source 5
    target 48
    weight 2
  ]
  edge [
    source 5
    target 49
    weight 2
  ]
  edge [
    source 5
    target 50
    weight 1
  ]
  edge [
    source 5
    target 51
    weight 2
  ]
  edge [
    source 5
    target 53
    weight 1
  ]
  edge [
    source 5
    target 63
    weight 1
  ]
  edge [
    source 5
    target 64
    weight 1
  ]
  edge [
    source 5
    target 65
    weight 1
  ]
  edge [
    source 5
    target 70
    weight 2
  ]
  edge [
    source 5
    target 71
    weight 1
  ]
  edge [
    source 6
    target 72
    weight 1
  ]
  edge [
    source 7
    target 34
    weight 1
  ]
  edge [
    source 25
    target 60
    weight 1
  ]
  edge [
    source 8
    target 43
    weight 1
  ]
  edge [
    source 24
    target 60
    weight 1
  ]
  edge [
    source 10
    target 14
    weight 1
  ]
  edge [
    source 12
    target 39
    weight 1
  ]
  edge [
    source 13
    target 20
    weight 1
  ]
  edge [
    source 15
    target 17
    weight 1
  ]
  edge [
    source 15
    target 35
    weight 1
  ]
  edge [
    source 15
    target 59
    weight 1
  ]
  edge [
    source 15
    target 34
    weight 1
  ]
  edge [
    source 15
    target 30
    weight 1
  ]
  edge [
    source 16
    target 28
    weight 1
  ]
  edge [
    source 18
    target 51
    weight 1
  ]
  edge [
    source 19
    target 58
    weight 1
  ]
  edge [
    source 20
    target 22
    weight 1
  ]
  edge [
    source 21
    target 9
    weight 1
  ]
  edge [
    source 23
    target 60
    weight 1
  ]
  edge [
    source 11
    target 40
    weight 1
  ]
  edge [
    source 26
    target 62
    weight 1
  ]
  edge [
    source 27
    target 60
    weight 1
  ]
  edge [
    source 29
    target 60
    weight 1
  ]
  edge [
    source 30
    target 46
    weight 1
  ]
  edge [
    source 30
    target 34
    weight 1
  ]
  edge [
    source 31
    target 60
    weight 1
  ]
  edge [
    source 32
    target 60
    weight 1
  ]
  edge [
    source 33
    target 61
    weight 1
  ]
  edge [
    source 34
    target 75
    weight 1
  ]
  edge [
    source 35
    target 56
    weight 1
  ]
  edge [
    source 36
    target 41
    weight 1
  ]
  edge [
    source 38
    target 63
    weight 1
  ]
  edge [
    source 38
    target 64
    weight 1
  ]
  edge [
    source 41
    target 73
    weight 1
  ]
  edge [
    source 41
    target 56
    weight 1
  ]
  edge [
    source 41
    target 68
    weight 1
  ]
  edge [
    source 42
    target 50
    weight 1
  ]
  edge [
    source 45
    target 67
    weight 1
  ]
  edge [
    source 47
    target 56
    weight 1
  ]
  edge [
    source 48
    target 54
    weight 1
  ]
  edge [
    source 52
    target 56
    weight 1
  ]
  edge [
    source 53
    target 69
    weight 1
  ]
  edge [
    source 55
    target 70
    weight 1
  ]
  edge [
    source 56
    target 73
    weight 1
  ]
  edge [
    source 56
    target 75
    weight 1
  ]
  edge [
    source 57
    target 60
    weight 1
  ]
  edge [
    source 65
    target 66
    weight 1
  ]
]

但是,当我尝试阅读GML时,networkx会抛出以下异常,抱怨

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 0: ordinal not in range(128)

我可以知道我怎么能

  1. 正确阅读此GML?是否有可能知道导致此错误的GML行?
  2. 编写GML而不会导致上述错误?
  3. 使用在Windows 7上运行的networkx版本2.0b1

    我收到以下错误:

    ---------------------------------------------------------------------------
    UnicodeDecodeError                        Traceback (most recent call last)
    <ipython-input-9-12160890b63f> in <module>()
    ----> 1 G = nx.read_gml('test.gml')
    
    <decorator-gen-501> in read_gml(path, label, destringizer)
    
    c:\python2_7_13\lib\site-packages\networkx-2.0b1-py2.7.egg\networkx\utils\decorators.pyc in _open_file(func, *args, **kwargs)
        219         # Finally, we call the original function, making sure to close the fobj.
        220         try:
    --> 221             result = func(*new_args, **kwargs)
        222         finally:
        223             if close_fobj:
    
    c:\python2_7_13\lib\site-packages\networkx-2.0b1-py2.7.egg\networkx\readwrite\gml.pyc in read_gml(path, label, destringizer)
        216             yield line
        217 
    --> 218     G = parse_gml_lines(filter_lines(path), label, destringizer)
        219     return G
        220 
    
    c:\python2_7_13\lib\site-packages\networkx-2.0b1-py2.7.egg\networkx\readwrite\gml.pyc in parse_gml_lines(lines, label, destringizer)
        396 
        397     tokens = tokenize()
    --> 398     graph = parse_graph()
        399 
        400     directed = graph.pop('directed', False)
    
    c:\python2_7_13\lib\site-packages\networkx-2.0b1-py2.7.egg\networkx\readwrite\gml.pyc in parse_graph()
        385 
        386     def parse_graph():
    --> 387         curr_token, dct = parse_kv(next(tokens))
        388         if curr_token[0] is not None:  # EOF
        389             unexpected(curr_token, 'EOF')
    
    c:\python2_7_13\lib\site-packages\networkx-2.0b1-py2.7.egg\networkx\readwrite\gml.pyc in parse_kv(curr_token)
        370                 curr_token = next(tokens)
        371             elif category == 4:  # dict start
    --> 372                 curr_token, value = parse_dict(curr_token)
        373             else:
        374                 unexpected(curr_token, "an int, float, string or '['")
    
    c:\python2_7_13\lib\site-packages\networkx-2.0b1-py2.7.egg\networkx\readwrite\gml.pyc in parse_dict(curr_token)
        380     def parse_dict(curr_token):
        381         curr_token = consume(curr_token, 4, "'['")    # dict start
    --> 382         curr_token, dct = parse_kv(curr_token)
        383         curr_token = consume(curr_token, 5, "']'")  # dict end
        384         return curr_token, dct
    
    c:\python2_7_13\lib\site-packages\networkx-2.0b1-py2.7.egg\networkx\readwrite\gml.pyc in parse_kv(curr_token)
        370                 curr_token = next(tokens)
        371             elif category == 4:  # dict start
    --> 372                 curr_token, value = parse_dict(curr_token)
        373             else:
        374                 unexpected(curr_token, "an int, float, string or '['")
    
    c:\python2_7_13\lib\site-packages\networkx-2.0b1-py2.7.egg\networkx\readwrite\gml.pyc in parse_dict(curr_token)
        380     def parse_dict(curr_token):
        381         curr_token = consume(curr_token, 4, "'['")    # dict start
    --> 382         curr_token, dct = parse_kv(curr_token)
        383         curr_token = consume(curr_token, 5, "']'")  # dict end
        384         return curr_token, dct
    
    c:\python2_7_13\lib\site-packages\networkx-2.0b1-py2.7.egg\networkx\readwrite\gml.pyc in parse_kv(curr_token)
        362                 curr_token = next(tokens)
        363             elif category == 3:  # strings
    --> 364                 value = unescape(curr_token[1][1:-1])
        365                 if destringizer:
        366                     try:
    
    c:\python2_7_13\lib\site-packages\networkx-2.0b1-py2.7.egg\networkx\readwrite\gml.pyc in unescape(text)
        119             return text  # leave unchanged
        120 
    --> 121     return re.sub("&(?:[0-9A-Za-z]+|#(?:[0-9]+|x[0-9A-Fa-f]+));", fixup, text)
        122 
        123 
    
    c:\python2_7_13\lib\re.pyc in sub(pattern, repl, string, count, flags)
        153     a callable, it's passed the match object and must return
        154     a replacement string to be used."""
    --> 155     return _compile(pattern, flags).sub(repl, string, count)
        156 
        157 def subn(pattern, repl, string, count=0, flags=0):
    
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 0: ordinal not in range(128)
    

1 个答案:

答案 0 :(得分:0)

我使用当前的networkx-2.0b1测试版进行了测试,结果正常。

In [1]: import networkx as nx

In [2]: G = nx.read_gml('test.gml')

In [3]: nx.write_gml(G,'test2.gml')

In [4]: H = nx.read_gml('test2.gml')

In [5]: nx.is_isomorphic(G,H)
# True

如果您没有使用该版本,也许您可​​以更新并测试?