Question

我有一个变量索引，其结构如下：

{ ‘web’: [ [1, [0, 2]], [2, [2]] ], ‘retrieval’: [ [1, [1]] ], ‘search’: [ [1, [3]], [2, [0]] ], ‘information’: [ [1, [4]] ], ‘engine’: [ [2, [1]] ], ‘ranking’: [ [2, [3]] ] }

我需要以下列格式在文件中写这个

term|docID1:pos1,pos2;docID2:pos3,pos4,pos5;…

因此，帖子列表对‘web’: [ [1, [0, 2]], [2, [2]] ]将保存为：web|1:0,2;2:2

这是我正在使用的代码，

def writeIndexToFile(self):
    '''write the inverted index to the file'''
    f=open("indexFile.dat", 'w')
    for term in self.index.items():
        postinglist=[]
        for p in self.index[term]:
            docID=p[0]
            positions=p[1]
            postinglist.append(':'.join([str(docID) ,','.join(map(str,positions))]))
        print >> f, ''.join((term,'|',';'.join(postinglist)))

    f.close()

我收到以下错误：

for p in self.index[term]:
TypeError: unhashable type: 'list'

我正在使用python 3.4。

Answer 1

希望这能让您了解自己的需求：

index = { 'web': [ [1, [0, 2]], [2, [2]] ], 'retrieval': [ [1, [1]] ], 'search': [ [1, [3]], [2, [0]] ], 'information': [ [1, [4]] ], 'engine': [ [2, [1]] ], 'ranking': [ [2, [3]] ] } 

with open("indexFile.dat", 'w') as f:
    for k,v in index.items():
        row = "%s|%s\n" % (k, ";".join(["%s:%s" % (i, ",".join([str(x) for x in r])) for i,r in v]))
        f.write(row)

哪个会创建一个文件，如下所示：

engine|2:1
web|1:0,2;2:2
search|1:3;2:0
ranking|2:3
information|1:4
retrieval|1:1

使用Python 2.7进行测试，因此需要对Python 3.4进行一些小的调整。

Answer 2

Document document = new Document(); PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream("C:\\Report.pdf")); document.open(); document.add(new Paragraph("A Hello World PDF new TEXT document.")); PdfContentByte contentByte = writer.getDirectContent(); PdfTemplate template = contentByte.createTemplate(50,50); template.beginText(); BaseFont bf=BaseFont.createFont(BaseFont.HELVETICA,BaseFont.CP1252,BaseFont.NOT_EMBEDDED); template.setFontAndSize(bf,10); template.setTextMatrix(100,100); template.showText("Text at the position 100,100 (relative to the template!)"); template.endText(); contentByte.addTemplate(template, 10, 100); document.close();返回一个元组列表，因此术语是dict.items，包含你的dict中的键和值对，你试图使用包含列表作为键的元组，因为它包含列表不可清除，您会收到错误。

如果您希望单独解压键和值，请执行以下操作：

tuple

但是你似乎只是通过尝试for k,v in self.index.items(): for p in self.index[k]:来判断使用这些值，所以从一开始就使用它们：

self.index[term]

如果值都有两个元素，您也可以解压缩：

for p in self.index.values()

输出：

 for k,v  in d.items():
    for doc, pos in v:
        print(doc ,pos)

Answer 3

元组拆包将有助于在这里清理一下。当你有一个返回两个（或更多项）的迭代器时，你可以像这样解压缩值。

x, y = [1, 2]

您还可以进行扩展解包，在第一个项目中将其与其他项目分开。

x, *y = range(10)
# [0], [1, 2, 3, 4, 5, 6, 7, 8, 9]

此外，Python 3倾向于使用字符串格式而不是“％s”类型的东西（printf样式格式）。在输入之前无需将整数转换为字符串。

'{}: {}'.format(1, [3, 4])
# '1: [3, 4]'

最后但并非最不重要的是，从writeIndexToFile中取出打印意味着您可以将其用于其他事情。这是生成器的一个很好的例子 - 一次返回一个东西的函数或方法。

def format_index(index):
    for term, position_ in index.items():
        posting_list = []
        for doc_id, raw_positions in position_:
            formatted_positions = map(str, raw_positions)
            positions = ','.join(formatted_positions)

            posting = '{}:{}'.format(doc_id, positions)
            posting_list.append(posting)

        yield '{}|{}'.format(term, ';'.join(posting_list))

现在您可以将它与

一起使用

with open('file.dat', 'w') as f:
    formatted_index = format_index(index)
    f.write(';'.join(formatted_index))

TypeError：不可用类型：写入文件时为'list'。

3 个答案: