熊猫数据框中的SettingWithCopyWarning

时间:2018-09-22 06:00:23

标签: python pandas

下面的代码行给了我pandas中的SettingWithCopyWarning。我在SettingWithCopyWarning in pandasSettingWithCopyWarning in Pandas DataFrame using Python以及http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy中也提到了类似的问题。但是我听不懂。如何在以下代码中解决此问题?

print("Applying sentiment analysis\n")
analyzer = SentimentIntensityAnalyzer()
reordered['sentiments'] = reordered['text'].apply(lambda row: list(map(analyzer.polarity_scores, row)))
print(reordered.head())
  

reordered.head(5).to_dict())

{'id': {0: 1042616899408945154, 1: 1042592536769044487, 2: 1042587702040903680, 3: 1042587263643930626, 4: 1042586780292276230}, 'date': {0: '2018-09-20', 1: '2018-09-20', 2: '2018-09-20', 3: '2018-09-20', 4: '2018-09-20'}, 'time': {0: '03:30:14', 1: '01:53:25', 2: '01:34:13', 3: '01:32:28', 4: '01:30:33'}, 'text': {0: "b'\\xf0\\x9f\\x8c\\xb9 are red, violets are blue, if you want to buy us \\xf0\\x9f\\x92\\x90, here is a CLUE \\xf0\\x9f\\x98\\x89 Our #flowerpowered eye & cheek palette is AL\\xe2\\x80\\xa6'", 1: "b'\\xf0\\x9f\\x8e\\xb5Is it too late now to say sorry\\xf0\\x9f\\x8e\\xb5 #tartetalk #memes'", 2: "b'@JillianJChase Oh no! Please email your order # to social@tarte.com & we can help \\xf0\\x9f\\x92\\x95'", 3: 'b"@Danikins__ It\'s best applied with our buffer brush! \\xf0\\x9f\\x92\\x9c\\xc2\\xa0"', 4: "b'@AdelaineMorin DEAD \\xf0\\x9f\\xa4\\xa3\\xf0\\x9f\\xa4\\xa3\\xf0\\x9f\\xa4\\xa3'"}, 'hasMedia': {0: 0, 1: 1, 2: 0, 3: 0, 4: 0}, 'hasHashtag': {0: 1, 1: 1, 2: 0, 3: 0, 4: 0}, 'followers_count': {0: 801745, 1: 801745, 2: 801745, 3: 801745, 4: 801745}, 'retweet_count': {0: 17, 1: 94, 2: 0, 3: 0, 4: 0}, 'favourite_count': {0: 181, 1: 408, 2: 0, 3: 0, 4: 14}}
  

def preprocessData()

    def preprocessData():
    fullcorpus = pd.read_csv("tweets.csv")
    fullcorpus.columns = ["id", "created_at", "text", "hasMedia", "hasHashtag", "followers_count", "retweet_count",
                          "favourite_count"]
    fullcorpus.head()

    pd.set_option('display.max_columns', None)

    print("This is the initial data set")
    print(fullcorpus.head())
    print("\n")


    print("Removing Duplicates\n")
    duplicates_removed = fullcorpus.drop_duplicates(subset='id', keep='first', inplace=False)
    print(duplicates_removed.head())

    print("Spliting created_at\n")
    created_at_Splitted = duplicates_removed['created_at'].str.split(' ', 1, expand=True).rename(
        columns={0: 'date', 1: 'time'})
    concatinated = pd.concat([duplicates_removed, created_at_Splitted], axis=1)
    created_at_dropped = concatinated.drop(['created_at'], axis=1)
    reordered = created_at_dropped[
        ["id", "date", "time", "text", "hasMedia", "hasHashtag", "followers_count", "retweet_count", "favourite_count"]]
    print(reordered.head())

    print(reordered['text'].head())

    #print(reordered.head(5).to_dict())

    print("Applying sentiment analysis\n")
    analyzer = SentimentIntensityAnalyzer()
    reordered['sentiments'] = reordered['text'].apply(lambda row: list(map(analyzer.polarity_scores, row)))
    print(reordered.head())

1 个答案:

答案 0 :(得分:1)

最可能的解释是,数据帧reordered是使用loc创建的,或者可能没有明确创建副本的东西,例如:

reordered = original_dataframe.loc[mycondition]
# or
reordered = original_dataframe[['my', 'columns']]

或类似的内容。

为避免设置带有复制警告的设置,请在创建reordered时使用以下类似内容:

reordered = original_dataframe.loc[mycondition].copy()
# or
reordered = original_dataframe[['my', 'columns']].copy()

您应该看不到警告,因为它是显式副本。