Question

我有一个带有allTexts列的pandas数据框，它存储了每行的一堆文本信息。我正在尝试应用自定义函数，该函数在给定输入文本的情况下返回3个值。然后我想将这3个输出值存储在一个新的数据帧列中 - 理想情况下是每行的numpy数组。我使用apply()执行此操作，代码成功完成，但实际上并未更改值。

#stub for creating a dataframe
df = pd.DataFrame({'allText':['Hateful text. This is bad', 'Text about great stuff', ' ']})

#set a placeholder - just 3 zeros for each record
df['Sentiments'] = df['allText'].apply(lambda x: np.zeros(3))

#function definition. It is a textblob library function, which gives me back sentiment scores for each text
def getTextSentiments(text):
    blob = TextBlob(text)
    pos = 0
    neg = 0
    neutral = 0
    count = 0
    for sentence in blob.sentences:
        sentiment = sentence.sentiment.polarity
        if sentiment > 0.1:
            pos +=1
        elif sentiment > -0.1:
            neutral +=1
        else:
            neg +=1
        count+=1
    if count == 0:
        count = 1
    return numpy.array([pos/count, neutral/count, neg/count])

#apply function only for non-empty texts and override 3 zeros in sentiments column with real 3 values
df[df["allText"]!=" "]['Sentiments'] = df[df["allText"]!=" "]["allText"].apply(getTextSentiments)

此代码完成后没有任何错误，我的Sentiments列中的所有零值仍然相同。

MVP证明它即使单一记录也不起作用：

df[df["allText"]!=" "].iloc[0]['Sentiments']
array([ 0.,  0.,  0.])
test = getTextSentiments(df[df["allText"]!=" "].iloc[0]['allText'])

test
Out[64]: (0.4166666666666667, 0.5, 0.08333333333333333)
df[df["allText"]!=" "].iloc[0]['Sentiments'] = test

df[df["allText"]!=" "].iloc[0]['Sentiments']
Out[75]: array([ 0.,  0.,  0.])

关于我做错什么的任何建议？

Answer 1

您可以尝试以下方法吗？

df.Sentiments = df.apply(lambda x: x.Sentiments if x.allText ==' ' else getTextSentiments(x.allText), axis=1)

使用虚拟getTextSentiments函数进行测试：

df = pd.DataFrame({'allText':['Hateful text. This is bad', 'Text about great stuff', ' ']})

#set a placeholder - just 3 zeros for each record
df['Sentiments'] = df['allText'].apply(lambda x: np.zeros(3))
def getTextSentiments(text):
    return (0.4166666666666667, 0.5, 0.08333333333333333)
df.Sentiments = df.apply(lambda x: x.Sentiments if x.allText ==' ' else getTextSentiments(x.allText), axis=1)
df
Out[181]: 
                     allText                                      Sentiments
Out[181]: 
                     allText                                      Sentiments
0  Hateful text. This is bad  (0.4166666666666667, 0.5, 0.08333333333333333)
1     Text about great stuff  (0.4166666666666667, 0.5, 0.08333333333333333)
2                                                            [0.0, 0.0, 0.0]

Pandas - 将numpy数组存储在数据帧列中，这是函数的结果

1 个答案: