我正在从以下字典构建数据框:
diction = {"a":{"aa":0, "bb":2, "cc":3}, "b":{"aa":4, "bb":5, "cc":6}, "c":{"aa":7, "bb":8, "cc":9}}
df = pandas.DataFrame(diction)
然后我尝试使用以下函数对数据帧执行某些操作:
import pandas
import numpy as np
import math
def applyGivenTotals(rowOrColumn, rowOrColumnTotals, rowOrColumnName):
return rowOrColumn/float(rowOrColumnTotals[rowOrColumnName])
def piLogpi(value):
if(value==0):
return 0
else:
return -value * math.log10(value)
def someFunction(df):
entropy = {}
rowTotals = df.sum(axis=1)
rowApplied = df.apply(lambda row:applyGivenTotals(row, rowTotals, row.name), axis=1)
unSummedPis = rowApplied.apply(np.vectorize(piLogpi))
return unSummedPis
我在最左边的列中得到零,但我不明白为什么。
澄清一下,我的预期结果是:
a b c
aa 0 0.159757 0.124915
bb 0.116675 0.159040 0.145601
cc 0.129692 0.159040 0.150515
但我得到了:
a b c
aa 0 0.159757 0.124915
bb 0 0.159040 0.145601
cc 0 0.159040 0.150515
答案 0 :(得分:2)
我认为元素进程需要applymap
:
def someFunction(df):
entropy = {}
rowTotals = df.sum(axis=1)
rowApplied = df.apply(lambda row:applyGivenTotals(row, rowTotals, row.name), axis=1)
unSummedPis = rowApplied.applymap(np.vectorize(piLogpi))
return unSummedPis
print (someFunction(df))
a b c
aa 0.000000 0.159757 0.124915
bb 0.116675 0.159040 0.145601
cc 0.129692 0.159040 0.150515