错误:' list'对象没有属性' lower'

时间:2017-11-09 02:56:26

标签: python csv error-handling tf-idf cosine-similarity

所以我创建了这个代码来计算两个不同csv文件中两列之间的余弦相似度,两列都包含作业描述行。

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import TfidfVectorizer
import pandas as pd

df = df = pd.read_csv("Green jobs description.csv")
df2 = pd.read_csv("ExtractedData_2006.csv")
jobs = df.Description.tolist()
jobs2 = df2.Description.tolist()

train_set = [jobs, jobs2]

tfidf_vectorizer = TfidfVectorizer()
tfidf_matrix_train = tfidf_vectorizer.fit_transform(train_set)  #finds the tfidf score with normalization
print "cosine scores ==> ",cosine_similarity(tfidf_matrix_train[0:1], tfidf_matrix_train)

因此,当我运行代码时出现此错误,我将整个回溯包含在图片中以便清楚

code traceback

任何人都可以帮我这个吗?

1 个答案:

答案 0 :(得分:0)

我想出了如何解决它

train_set=jobs+jobs2
train_set=[tmp.lower() for tmp in train_set]

几乎就是这样。我只需要在列表中添加较低的字符串。