我是Machine Learning和python的新手。最近我一直在使用来自kaggle及其代码的亚马逊美食评论数据。 我不明白的是这里使用的'partiton'方法是怎样的? 而且,最后3行代码实际上做了什么?
%matplotlib inline
import sqlite3
import pandas as pd
import numpy as np
import nltk
import string
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics import confusion_matrix
from sklearn import metrics
from sklearn.metrics import roc_curve, auc
from nltk.stem.porter import PorterStemmer
# using the SQLite Table to read data.
con = sqlite3.connect('./amazon-fine-food-reviews/database.sqlite')
#filtering only positive and negative reviews i.e.
# not taking into consideration those reviews with Score=3
filtered_data = pd.read_sql_query("""
SELECT *
FROM Reviews
WHERE Score != 3
""", con)
# Give reviews with Score>3 a positive rating, and reviews with a
score<3 a negative rating.
def partition(x):
if x < 3:
return 'negative'
return 'positive'
#changing reviews with score less than 3 to be positive vice-versa
actualScore = filtered_data['Score']
positiveNegative = actualScore.map(partition)
filtered_data['Score'] = positiveNegative
答案 0 :(得分:0)
使用filters from filtered_data
创建一个名为actualScore的数组 actualScore = filtered_data['Score']
为值<3创建数组positiveNegative编码为负,为&gt; 3
创建正数 positiveNegative = actualScore.map(partition)
用新编码值覆盖旧列分数
filtered_data['Score'] = positiveNegative
答案 1 :(得分:0)
我认为实际上是将表中的Score列替换为正数或负数,我们使用了称为partition的方法。将Score列作为dataframeactualScore,然后使用替换值(正或负)的值映射该数据帧。然后用 positiveNegative
替换得分列中的值。