Question

我对Python很新，我已经找到了这个错误的答案，但是我没有足够的经验来确切地看到我出错的地方 - 它可能是非常基本的东西。
我正在开展一个项目，根据他们在文本中使用的单词来识别作者。我将每个作者的单词添加到字典中，单词作为键，值是单词出现在该作者的文本中的次数。我还创建了所有作者所有单词的词汇表，并使用它们来计算概率。这最初工作正常当我添加k-fold交叉验证时，我的问题出现了，因为我的语料库并不是特别大。我遍历一个作者名称列表，它与我分配给他们空字典的名称相匹配。一旦我提取了我想要的文件，我想将清理/解析的文本添加到字典中，但是我得到了上面的错误，它引用了我的行 author [word] = 1 字典fn，我在下面的第二行代码中调用。从我对其他答案的解读，它与str是不可变的，但我只是看不出如何应用我的问题的答案。非常感谢您的帮助！ Ps我知道有些库可以完成所有这些工作，但项目的整个想法是建立我自己的模型，并将其与其他模型进行比较。

path = "C:\\......\The Letters\\"

#create an empty vocab set
vocab = set()
stop = stopwords.words('english')

snowball = SnowballStemmer('english')

#create empty dictionary for each author
AuthorA = {}
AuthorB = {}
AuthorC = {}

authorList = ["AuthorA","AuthorB","Authorc"]

#function to preprocess the words.  Opens & reads file, removes non alphabet
#characters, converts to lowercase, and tokenizes
def cleanText(path,author,eachfile):    
    f= open(path+author+"\\"+eachfile, "r")        
    contents = f.read()   
    strip = re.sub('[^a-zA-Z]',' ',contents) 
    lowerCase = strip.lower()    
    allwords = lowerCase.split()    
    return allwords

#function to add words to the vocabulary set
def createVocab(allwords):    
    for word in allwords:
        if len(word)>= 4:
            vocab.update(allwords)
    return

#function to add words to author dictionary and count occurrences of each word    
def dictionary(allwords, author):
    for word in allwords:
        if len(word)>= 4:
            if word in author:
                author[word]= author[word]+1
            else:
                author[word]= 1 
    return


def main():
    global authorList
    global path
    global vocab
    global AuthorA
    global AuthorB
    global AuthorC

    for author in authorList:
#filename and path
        listing = os.listdir(path+author)

#specify parameters for k fold validation
        #split into 10 folds and take a file form each fold
        #repeat for until the entire directory has been split
        folds = 10
        subset_size = len(path+author)/folds
        for i in range(folds):
            #use these files to train the model
            current_train = listing[:i*subset_size:]+listing[(i+1)*subset_size:]
            #use these files to test the model
            current_test = listing[i*subset_size:][:subset_size]

    #iterate through the files selected by current_train variable 
                for eachfile in current_train:
    #call function to parse text         
                    allwords = cleanText(path,author,eachfile)                
    #call fn to add words to dictionary
                    dictionary(allwords, author)                
    #call fn to add words to vocab
                    createVocab(allwords)

Answer 1

您将字典函数传递给变量作者的字符串。 top for循环，for author in authorList:迭代字符串列表，而不是字典集合。 authorList = ["AuthorA","AuthorB","Authorc"]

您想要将dict集合传递给您的函数。希望有所帮助！

Python错误：'str'对象不支持项目分配

1 个答案: