Question

我获得了以下Python 2x代码。我将import urllib2更改为from urllib.request import urlopen，将其转换为Python 3x。我摆脱了urllib2引用并运行程序。检索到网址末尾的文档，但程序在指示的行处失败，抛出错误

TypeError: a bytes-like object is required, not 'str'

该文档如下所示：b'9306112 9210128 9202065 \r\n9306114 9204065 9301122 \r\n9306115 \r\n9306116 \r\n9306117 \r\n9306118 \r\n9306119

我尝试使用该行的返回值和上面的返回值（例如，转换为字节，拆分不同的值），但没有任何效果。有关于发生了什么的任何想法？

import urllib2


CITATION_URL = "http://storage.googleapis.com/codeskulptor-alg/alg_phys-cite.txt"

def load_graph(graph_url):
    """
    Function that loads a graph given the URL
    for a text representation of the graph

    Returns a dictionary that models a graph
    """
    graph_file = urllib2.urlopen(graph_url)
    graph_text = graph_file.read()
    graph_lines = graph_text.split('\n') <--- The Problem
    graph_lines = graph_lines[ : -1]

    print "Loaded graph with", len(graph_lines), "nodes"

    answer_graph = {}
    for line in graph_lines:
        neighbors = line.split(' ')
        node = int(neighbors[0])
        answer_graph[node] = set([])
        for neighbor in neighbors[1 : -1]:
            answer_graph[node].add(int(neighbor))

    return answer_graph

citation_graph = load_graph(CITATION_URL)
print(citation_graph)

Answer 1

为了将bytes对象视为字符串，您需要先对其进行解码。例如：

graph_text = graph_file.read().decode("utf-8")

如果编码是UTF-8。这应该允许您将其视为字符串而不是字节序列。

Answer 2

您只能将喜欢与喜欢分开 - 如果您希望与\n分开，同时仍将graph_text保留为bytes，请将拆分定义为bytes序列：

graph_lines = graph_text.split(b'\n')

否则，如果您知道您的graph_text数据编码的编解码器，首先将其解码为str并使用：graph_text.decode("<codec>")，然后继续将其视为str

从Python 2转换为Python 3：TypeError：需要类似字节的对象

2 个答案: