我是一个新的python,正在研究文本分类问题。我已经使用不同的在线资源开发了代码。但是此代码未进行pos标记。有人可以帮我找出我实际上在哪里出错的代码行。我正在代码中进行POS标记,但结果中未显示它。我也尝试过使用nltk进行POS标记,但是这对我也不起作用。任何帮助将不胜感激。谢谢。
# Add the Data using pandas
Corpus = pd.read_csv(r"U:\FAHAD UL HASSAN\Python Code\projectdatacor.csv",encoding='latin-1')
# Data Pre-processing - This will help in getting better results through the classification algorithms
# Remove blank rows if any.
Corpus['description'].dropna(inplace=True)
# Change all the text to lower case. This is required as python interprets 'design' and 'DESIGN' differently
Corpus['description'] = [entry.lower() for entry in Corpus['description']]
# Punctuation Removal
Corpus['description'] = Corpus.description.str.replace('[^\w\s]', '')
# Tokenization : In this each entry in the corpus will be broken into set of words
Corpus['description']= [word_tokenize(entry) for entry in Corpus['description']]
# Remove Stop words, Non-Numeric and perfom Word Stemming/Lemmenting.
# WordNetLemmatizer requires Pos tags to understand if the word is noun or verb or adjective etc. By default it is set to Noun
STOPWORDS = set(stopwords.words('english'))
tag_map = defaultdict(lambda : wn.NOUN)
tag_map['J'] = wn.ADJ
tag_map['V'] = wn.VERB
tag_map['R'] = wn.ADV
for index,entry in enumerate(Corpus['description']):
# Declaring Empty List to store the words that follow the rules for this step
Final_words = []
# Initializing WordNetLemmatizer()
word_Lemmatized = WordNetLemmatizer()
# pos_tag function below will provide the 'tag' i.e if the word is Noun(N) or Verb(V) or something else.
for word, tag in pos_tag(entry):
# Below condition is to check for Stop words and consider only alphabets
if word not in STOPWORDS and word.isalpha():
word_Final = word_Lemmatized.lemmatize(word,tag_map[tag[0]])
Final_words.append(word_Final)
# The final processed set of words for each iteration will be stored in 'description_final'
Corpus.loc[index,'description_final'] = str(Final_words)
print(Corpus['description_final'].head())
这些是我得到的结果。该代码可以完成标记化,停用词删除等所有操作,但是会在结果中显示pos标签。
runfile('U:/FAHAD UL HASSAN/Python Code/ayes.py', wdir='U:/FAHAD UL HASSAN/Python Code')
0 ['provision', 'part', 'schedule', 'shall', 'ap...
1 ['provision', 'part', 'schedule', 'shall', 'ap...
2 ['provision', 'part', 'schedule', 'shall', 'ap...
3 ['work', 'schedule', 'shall', 'provide', 'prog...
4 ['work', 'schedule', 'amendment', 'work', 'sch...