Question

我正在尝试通过150 MB文件的whoosh创建索引。但它显示错误列表索引超出范围：我引用了导致错误的行。那是for x in range(len(id)):。逻辑索引记录将等同于文档的ID号。

from whoosh import index

from whoosh.fields import Schema,ID, TEXT,NUMERIC
from whoosh import index
from whoosh.index import create_in

id = []
body = []
Score = []
count=0
doc_path='C:/Users/Abhi/Desktop/My_Experiments_with_truth/extracted_xml.txt'
with open(doc_path,'r+',encoding="utf8") as line:
 for f in line:
    count=count+1
    if f.startswith('Id : '):
            a = f.replace('Id : ','')
            id.append(a)
            #print(a)
    elif f.startswith('body : '):
            b = f.replace('body : ','')
            body.append(b)
            #print(b)
    elif  f.startswith('Score :'):
            c = f.replace('Score :','')
            Score.append(c)
            #print(c)

if not os.path.exists("index"):
        os.mkdir("index")
#design the Schema

schema=Schema(id_details=ID(stored=True),body_details=TEXT(stored=True),Score_details=NUMERIC(stored=True))

print(schema)


#creation of the index

ix = index.create_in("index", schema)

writer = ix.writer()
#Opening writer


for x in range(len(id)):
    writer.add_document(id_details=id[x],body_details=body[x],Score_details=Score[x])
writer.commit()
print("Index created")

Answer 1

我认为问题不在于嗖嗖声，而在于解析输入文件的方式。如果您在从输入文件中读取数据时不一致，您将获得不同大小的列表id, body, Score，导致此行失败：

  writer.add_document(id_details=id[x],body_details=body[x],Score_details=Score[x])

由于您只是与列表id的限制进行比较：range(len(id))

尝试改进对文件的解析，或者至少将x与id, body, Score

之间最短列表的限制进行比较

IndexError：列表索引超出范围（在whoosh搜索引擎库中）错误

1 个答案: