Question

我是Machine Learning和python的新手。最近我一直在使用来自kaggle及其代码的亚马逊美食评论数据。我不明白的是这里使用的'partiton'方法是怎样的？而且，最后3行代码实际上做了什么？

    %matplotlib inline
    import sqlite3
    import pandas as pd
    import numpy as np
    import nltk
    import string
    import matplotlib.pyplot as plt
    import seaborn as sns
    from sklearn.feature_extraction.text import TfidfTransformer
    from sklearn.feature_extraction.text import TfidfVectorizer

    from sklearn.feature_extraction.text import CountVectorizer
    from sklearn.metrics import confusion_matrix
    from sklearn import metrics
    from sklearn.metrics import roc_curve, auc
    from nltk.stem.porter import PorterStemmer



    # using the SQLite Table to read data.
    con = sqlite3.connect('./amazon-fine-food-reviews/database.sqlite') 




    #filtering only positive and negative reviews i.e. 
    # not taking into consideration those reviews with Score=3
    filtered_data = pd.read_sql_query("""
    SELECT *
    FROM Reviews
    WHERE Score != 3
    """, con) 




    # Give reviews with Score>3 a positive rating, and reviews with a 
    score<3 a negative rating.
    def partition(x):
    if x < 3:
        return 'negative'
    return 'positive'

    #changing reviews with score less than 3 to be positive vice-versa
    actualScore = filtered_data['Score']
    positiveNegative = actualScore.map(partition) 
    filtered_data['Score'] = positiveNegative

Answer 1

使用filters from filtered_data

创建一个名为actualScore的数组

actualScore = filtered_data['Score']

为值<3创建数组positiveNegative编码为负，为＆gt; 3

创建正数

positiveNegative = actualScore.map(partition)

用新编码值覆盖旧列分数

filtered_data['Score'] = positiveNegative

Answer 2

我认为实际上是将表中的Score列替换为正数或负数，我们使用了称为partition的方法。将Score列作为dataframeactualScore，然后使用替换值（正或负）的值映射该数据帧。然后用 positiveNegative

替换得分列中的值。

试图找出一个python代码

2 个答案: