试图找出一个python代码

时间:2018-03-06 07:23:21

标签: python machine-learning data-science kaggle

我是Machine Learning和python的新手。最近我一直在使用来自kaggle及其代码的亚马逊美食评论数据。 我不明白的是这里使用的'partiton'方法是怎样的? 而且,最后3行代码实际上做了什么?

    %matplotlib inline
    import sqlite3
    import pandas as pd
    import numpy as np
    import nltk
    import string
    import matplotlib.pyplot as plt
    import seaborn as sns
    from sklearn.feature_extraction.text import TfidfTransformer
    from sklearn.feature_extraction.text import TfidfVectorizer

    from sklearn.feature_extraction.text import CountVectorizer
    from sklearn.metrics import confusion_matrix
    from sklearn import metrics
    from sklearn.metrics import roc_curve, auc
    from nltk.stem.porter import PorterStemmer



    # using the SQLite Table to read data.
    con = sqlite3.connect('./amazon-fine-food-reviews/database.sqlite') 




    #filtering only positive and negative reviews i.e. 
    # not taking into consideration those reviews with Score=3
    filtered_data = pd.read_sql_query("""
    SELECT *
    FROM Reviews
    WHERE Score != 3
    """, con) 




    # Give reviews with Score>3 a positive rating, and reviews with a 
    score<3 a negative rating.
    def partition(x):
    if x < 3:
        return 'negative'
    return 'positive'

    #changing reviews with score less than 3 to be positive vice-versa
    actualScore = filtered_data['Score']
    positiveNegative = actualScore.map(partition) 
    filtered_data['Score'] = positiveNegative

2 个答案:

答案 0 :(得分:0)

使用filters from filtered_data

创建一个名为actualScore的数组

actualScore = filtered_data['Score']

为值<3创建数组positiveNegative编码为负,为&gt; 3

创建正数

positiveNegative = actualScore.map(partition)

用新编码值覆盖旧列分数

filtered_data['Score'] = positiveNegative

答案 1 :(得分:0)

我认为实际上是将表中的Score列替换为正数或负数,我们使用了称为partition的方法。将Score列作为dataframeactualScore,然后使用替换值(正或负)的值映射该数据帧。然后用 positiveNegative

替换得分列中的值。