数据帧适用于矢量化函数表现奇怪

时间:2017-11-16 09:08:07

标签: python-3.x pandas numpy dataframe

我正在从以下字典构建数据框:

diction = {"a":{"aa":0, "bb":2, "cc":3}, "b":{"aa":4, "bb":5, "cc":6}, "c":{"aa":7, "bb":8, "cc":9}}
df = pandas.DataFrame(diction)

然后我尝试使用以下函数对数据帧执行某些操作:

import pandas
import numpy as np
import math


def applyGivenTotals(rowOrColumn, rowOrColumnTotals, rowOrColumnName):
  return rowOrColumn/float(rowOrColumnTotals[rowOrColumnName])

def piLogpi(value):
  if(value==0):
    return 0
  else:
    return -value * math.log10(value)

def someFunction(df):
  entropy = {}
  rowTotals = df.sum(axis=1)
  rowApplied = df.apply(lambda row:applyGivenTotals(row, rowTotals, row.name), axis=1)
  unSummedPis = rowApplied.apply(np.vectorize(piLogpi))
  return unSummedPis

我在最左边的列中得到零,但我不明白为什么。

澄清一下,我的预期结果是:

    a         b         c
aa  0         0.159757  0.124915
bb  0.116675  0.159040  0.145601
cc  0.129692  0.159040  0.150515

但我得到了:

    a         b         c
aa  0  0.159757  0.124915
bb  0  0.159040  0.145601
cc  0  0.159040  0.150515

1 个答案:

答案 0 :(得分:2)

我认为元素进程需要applymap

def someFunction(df):
  entropy = {}
  rowTotals = df.sum(axis=1)
  rowApplied = df.apply(lambda row:applyGivenTotals(row, rowTotals, row.name), axis=1)
  unSummedPis = rowApplied.applymap(np.vectorize(piLogpi))
  return unSummedPis

print (someFunction(df))

           a         b         c
aa  0.000000  0.159757  0.124915
bb  0.116675  0.159040  0.145601
cc  0.129692  0.159040  0.150515