我希望pandas dataframe列中的inputvalues作为输入,另一个数据帧用于删除列。
corpus_top_words = pd.DataFrame(cv_addr.todense(), columns=cv.get_feature_names())
corpus_top_words =corpus_top_words.sum().rename_axis('Word').reset_index(name='Freq')
corpus_top_words=corpus_top_words.drop('Freq', axis=1)
from nltk.corpus import brown
word_list=list(brown.words())
feature_names=['Word']
word_list= pd.DataFrame(word_list,columns=feature_names)
brown_corpus=pd.DataFrame(word_list.Word.unique(),columns=feature_names)
brown_corpus['Word'] = brown_corpus['Word'].apply(lambda x: ' '.join([item.lower() for item in x.split()]))
english_words_corpus = pd.merge(corpus_top_words, brown_corpus, on='Word', how='inner')
english_words_corpus = pd.DataFrame(english_words_corpus.Word.unique(),columns=feature_names)
我需要将这个英文单词corpus传递给原始数据框以删除一些列:
data = data.drop(list_of_cols_to_drop, axis=1)
list_of_cols_to_drop = english_words_corpus
这对于稀疏系列如何
for i, col in enumerate(cv.get_feature_names()):
data[col] = pd.SparseSeries(cv_text[:, i].toarray().ravel(), fill_value=0)
答案 0 :(得分:0)
要删除与列表匹配的列,您可以执行以下操作:
data = data.drop([col for col in list_of_cols_to_drop if col in data.columns], axis=1)