Auc曲线在python中绘制

时间:2017-04-10 03:51:47

标签: python-2.7 numpy scikit-learn auc

我对svm AUC python代码有疑问:

print(__doc__)

import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm, datasets
from sklearn.metrics import roc_curve, auc
from sklearn.cross_validation import train_test_split
from sklearn.preprocessing import label_binarize
from sklearn.svm import SVC
from sklearn.multiclass import OneVsRestClassifier



from sklearn.feature_extraction.text import TfidfVectorizer
import numpy as np
tfidf_vect= TfidfVectorizer(use_idf=True, smooth_idf=True, sublinear_tf=False, ngram_range=(2,2))
from sklearn.cross_validation import train_test_split, cross_val_score

import pandas as pd

df = pd.read_csv('merged_quantized_list.csv',
                     header=0, sep=',', names=['id', 'content', 'label'])


X = tfidf_vect.fit_transform(df['content'].values)
y = df['label'].values

首先怀疑是因为我的csv文件包含60列和5000行,其中第一行是我的标签,其余是内容。这个x和y是否包含内容和标签?

第二件事是:当我运行此代码时,我得到了错误:

 X = tfidf_vect.fit_transform(df['content'].values)
  File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/sklearn/feature_extraction/text.py", line 1352, in fit_transform
    X = super(TfidfVectorizer, self).fit_transform(raw_documents)
  File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/sklearn/feature_extraction/text.py", line 839, in fit_transform
    self.fixed_vocabulary_)
  File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/sklearn/feature_extraction/text.py", line 762, in _count_vocab
    for feature in analyze(doc):
  File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/sklearn/feature_extraction/text.py", line 241, in <lambda>
    tokenize(preprocess(self.decode(doc))), stop_words)
  File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/sklearn/feature_extraction/text.py", line 207, in <lambda>
    return lambda x: strip_accents(x.lower())
AttributeError: 'numpy.int64' object has no attribute 'lower'

1 个答案:

答案 0 :(得分:0)

尝试:

X = tfidf_vect.fit_transform(df['content'].values.astype(str))