读取csv数据集和执行数据预处理时遇到问题

时间:2019-05-28 15:46:57

标签: python pandas data-science

This is the error I am getting.


AttributeError                            Traceback (most recent call last)
<ipython-input-28-913f9c114df8> in <module>
      2 Corpus['text'].dropna(inplace=True)
      3 # Step - b : Change all the text to lower case. This is required as python interprets 'dog' and 'DOG' differently
----> 4 Corpus['text'] = [entry.lower() for entry in Corpus['text']]
      5 # Step - c : Tokenization : In this each entry in the corpus will be broken into set of words
      6 Corpus['text']= [word_tokenize(entry) for entry in Corpus['text']]

<ipython-input-28-913f9c114df8> in <listcomp>(.0)
      2 Corpus['text'].dropna(inplace=True)
      3 # Step - b : Change all the text to lower case. This is required as python interprets 'dog' and 'DOG' differently
----> 4 Corpus['text'] = [entry.lower() for entry in Corpus['text']]
      5 # Step - c : Tokenization : In this each entry in the corpus will be broken into set of words
      6 Corpus['text']= [word_tokenize(entry) for entry in Corpus['text']]

**AttributeError: 'list' object has no attribute 'lower**

这是代码。我尝试在JUPITER笔记本中运行此代码。 删除数据中的空白行(如果有) 将所有文本更改为小写 词标记化 删除停用词 删除非字母文字 单词词法化

# Step - a: Remove blank rows if any.
Corpus['text'].dropna(inplace=True)
# Step - b : Change all the text to lower case. This is required as python interprets 'dog' and 'DOG' differently
Corpus['text'] = [entry.lower() for entry in Corpus['text']]
# Step - c : Tokenization : In this each entry in the corpus will be broken into set of words
Corpus['text']= [word_tokenize(entry) for entry in Corpus['text']]
# Step - d : Remove Stop words, Non-Numeric and perfom Word Stemming/Lemmenting.
# WordNetLemmatizer requires Pos tags to understand if the word is noun or verb or adjective etc. By default it is set to Noun
tag_map = defaultdict(lambda : wn.NOUN)
tag_map['J'] = wn.ADJ
tag_map['V'] = wn.VERB
tag_map['R'] = wn.ADV
for index,entry in enumerate(Corpus['text']):
    # Declaring Empty List to store the words that follow the rules for this step
    Final_words = []
    # Initializing WordNetLemmatizer()
    word_Lemmatized = WordNetLemmatizer()
    # pos_tag function below will provide the 'tag' i.e if the word is Noun(N) or Verb(V) or something else.
    for word, tag in pos_tag(entry):
        # Below condition is to check for Stop words and consider only alphabets
        if word not in stopwords.words('english') and word.isalpha():
            word_Final = word_Lemmatized.lemmatize(word,tag_map[tag[0]])
            Final_words.append(word_Final)
    # The final processed set of words for each iteration will be stored in 'text_final'
    Corpus.loc[index,'text_final'] = str(Final_words)

0 个答案:

没有答案