创建文档术语矩阵时的属性错误

时间:2017-04-16 20:10:14

标签: python pandas text text-analysis

我正在尝试创建以Pandas数据帧形式表示的文档术语矩阵。到目前为止,这是我的代码:

df_profession['Athlete_Clean'] = df_profession['Athlete Biographies'].str.lower()
df_profession['Athlete_Clean'] = df_profession['Athlete_Clean'].apply(lambda x: ''.join([i for i in x if not i.isdigit()]))
df_profession['Athlete_Clean'] = df_profession['Athlete_Clean'].str.split()
df_profession['Athlete_Clean'] = [word for word in df_profession['Athlete_Clean'] if word not in punctuation]
df_profession['Athlete_Clean'] = [word for word in df_profession['Athlete_Clean'] if word not in stopwords.words('english')]

profession_dtm_athlete = pandas.DataFrame(countvec.fit_transform(df_profession['Athlete_Clean']).toarray(), columns=countvec.get_feature_names(), index = df.index)
profession_dtm_athlete

当我运行此代码时,我收到以下错误:

'list' object has no attribute 'lower'

如何摆脱这个错误?

1 个答案:

答案 0 :(得分:0)

将列表对象包装在str()中以将它们转换为字符串:

df_profession['Athlete_Clean'] = str(df_profession['Athlete Biographies']).lower()
df_profession['Athlete_Clean'] = df_profession['Athlete_Clean'].apply(lambda x: ''.join([i for i in x if not i.isdigit()]))
df_profession['Athlete_Clean'] = str(df_profession['Athlete_Clean']).split()
df_profession['Athlete_Clean'] = [word for word in df_profession['Athlete_Clean'] if word not in punctuation]
df_profession['Athlete_Clean'] = [word for word in df_profession['Athlete_Clean'] if word not in stopwords.words('english')]

profession_dtm_athlete = pandas.DataFrame(countvec.fit_transform(df_profession['Athlete_Clean']).toarray(), columns=countvec.get_feature_names(), index = df.index)
profession_dtm_athlete