Question

我正在尝试在我的文件上运行lda模型。首先，我做了一些预处理，如标记化和停止删除单词。我正在为多个文件执行此操作，但是当我将最终输出传递给lda模型时，它给了我一个错误，我在Google中看到lda将多个文件作为输入。现在我想将每个文件的输出存储到一个数组，然后将该数组作为输入传递，但它也给我一个错误IndexError：列表赋值索引超出范围。我不知道是什么问题。非常感谢任何帮助，谢谢！

   # URDU STOP WORDS REMOVAL
    doc_clean = []
    stopwords_corpus = UrduCorpusReader('./data', ['stopwords-ur.txt'])    
    stopwords = stopwords_corpus.words()
    count = 1
    # print(stopwords)
    for infile in (wordlists.fileids()):
        words = wordlists.words(infile)
        finalized_words = remove_urdu_stopwords(stopwords, words)
        doc_clean[count] = finalized_words
        print(doc_clean)
        count =count+1
        print("\n==== WITHOUT STOPWORDS ===========\n")
        print(finalized_words)
        id2word = corpora.Dictionary(doc_clean)
        mm = [id2word.doc2bow(text) for text in texts]
        lda = models.ldamodel.LdaModel(corpus=mm, id2word=id2word, num_topics=3, update_every=1, chunksize=10000, passes=1)

Answer 1

此处无需使用>>> conn = mysql.connector.connect(f)Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib/python2.7/dist-packages/mysql/connector/__init__.py", line 98, in connect return MySQLConnection(*args, **kwargs) TypeError: __init__() takes exactly 1 argument (2 given) >>> conn = mysql.connector.connect(host="localhost",user="uname", password="pwd", database="reposter$tutorial_database") >>>变量。 List提供count函数将元素添加到列表中改变这个

append

到此

  doc_clean[count] = finalized_words

Answer 2

您将doc_clean定义为空列表，但在第一次迭代中，您使用count = 1引用doc_clean [count]，因此请引用空列表的第二个元素。

替换

var test = "Sunny,1\r\nSunny,2\r\nBobb,1";
var lines = test.Split('\r', '\n');
var vocabulary = lines.Select(z => z.Split(',')[0])
    .Where(z => !string.IsNullOrEmpty(z))
    .Distinct()
    .OrderBy(word => word)
    .ToList();

带

doc_clean[count]=finalized_words

然后就不再使用计数了。

如何将多个文件的输出传递给数组

2 个答案: