Question

我正在阅读一些xml，并且存储的信息为

H.P. Dembinski, B. K\'{e}gl, I.C. Mari\c{s}, M. Roth, D. Veberi\v{c}

我想以这种格式准确地得到这个，而不是我得到

u"H.P. Dembinski, B. K\\'{e}gl, I.C. Mari\\c{s}, M. Roth, D. Veberi\\v{c}"

所以我的问题是在我的处理步骤中哪里出错了？这是我做的事情

today = some date
base_url = "http://export.arxiv.org/oai2?verb=ListRecords&"
url = (base_url + "from=%s&until=%s&" % (today, today) + "metadataPrefix=arXivRaw")

try:
    response = urllib2.urlopen(url)

except urllib2.HTTPError, e:
    return

rawdata = response.read()
root = ET.fromstring(rawdata)

if root.find(OAI+'ListRecords') is not None:
   for record in root.find(OAI+'ListRecords').findall(OAI+"record"):
     author_string = info.find(ARXIVRAW+"authors").text

如何防止引入第二个\？我知道我可以做一些替换（），但必须有一种方法来获取原始文本？谢谢卡尔

Answer 1

您只是看到字符串的内部表示，它使用双\\来显示转义的反斜杠。如果您打印字符串，您应该只看到一个\。示例 -

>>> print(u"H.P. Dembinski, B. K\\'{e}gl, I.C. Mari\\c{s}, M. Roth, D. Veberi\\v{c}")
H.P. Dembinski, B. K\'{e}gl, I.C. Mari\c{s}, M. Roth, D. Veberi\v{c}

您还可以在开头注意到u，表示它是unicode字符串的内部表示。

所以它很好，即使在写文件等时也应该没问题。它应该工作正常，只有一个\应该来。

xml解析和乳胶表示法

1 个答案: