熊猫的差异但用户定义的功能

时间:2018-12-17 01:38:43

标签: python pandas

Pandas的pandas.DataFrame.diff几乎可以完成我想做的事情。

documentation

>>> df = pd.DataFrame({'a': [1, 2, 3, 4, 5, 6],
...                    'b': [1, 1, 2, 3, 5, 8],
...                    'c': [1, 4, 9, 16, 25, 36]})
>>> df
   a  b   c
0  1  1   1
1  2  1   4
2  3  2   9
3  4  3  16
4  5  5  25
5  6  8  36

df.diff(axis=0)df.diff(axis=1)分别产生

>>> df.diff()
     a    b     c
0  NaN  NaN   NaN
1  1.0  0.0   3.0
2  1.0  1.0   5.0
3  1.0  1.0   7.0
4  1.0  2.0   9.0
5  1.0  3.0  11.0

>>> df.diff(axis=1)
    a    b     c
0 NaN  0.0   0.0
1 NaN -1.0   3.0
2 NaN -1.0   7.0
3 NaN -1.0  13.0
4 NaN  0.0  20.0
5 NaN  2.0  28.0

df.diff所做的本质上就是应用此功能

def diff_func(columns):
    return columns[1:] - columns[0:-1]

我想定义自己的函数,以代替diff_func。 我想要的是将自己的函数(可能是非线性的)“应用于”连续的(periods=1)列/行。例如,func(x,y) = sin(x)*cos(y),其中x,yperiods=n

的连续列或行

2 个答案:

答案 0 :(得分:3)

您应该考虑shift

df-df.shift(1)
     a    b     c
0  NaN  NaN   NaN
1  1.0  0.0   3.0
2  1.0  1.0   5.0
3  1.0  1.0   7.0
4  1.0  2.0   9.0
5  1.0  3.0  11.0

答案 1 :(得分:1)

这是使用pandas DataFrame内置方法“ apply”为我工作的一种方法。

我在product中包装了自定义diff函数diff_custom,该函数接受两个参数并返回一个值,而diff_custom则接受一个向量参数并返回等长向量,并适当地进行了NaN填充。我们使用pandas DataFrame的内置apply方法执行import pandas import numpy df = pandas.DataFrame([[4, 9],[5,10],[22,44]], columns=['A', 'B']) # Function that acts on neighboring row or column values, val1 and val2 def product(val1,val2): return(val1*val2) # Function that acts on DataFrame row or column, x def diff_custom(x): vals = [product(x[n],x[n+1]) for n in range(x.shape[0]-1)] ret = list([numpy.nan]) # pad return vector however you need to ret = ret + vals return(ret) # Use DataFrame built-in 'apply' method df.apply(diff_custom,axis=1) A B 0 NaN 36.0 1 NaN 50.0 2 NaN 968.0 df.apply(diff_custom,axis=0) A B 0 NaN NaN 1 20.0 90.0 2 110.0 440.0

{
  "Invoices" : [
    {
      "VendorNumber" : "vendorA",
      "DocumentNumber" : "1234",
      "DocumentType" : "Invoice",
      "DocumentDate" : "2019-09-06T06:00:00.000Z",
      "DueDate" : "2019-10-06T06:00:00.000Z",
      "Taxable" : "1",
      "TaxAmountControl" : "0",
      "TaxGroup" : "GST",
      "TaxAuthority1" : "GST",
      "TaxClass1" : 1,
      "TaxAmount1" : 5,
      "DocumentTotalIncludingTax" : 105,
      "InvoiceDetails" : [
        {
          "ManualTaxEntry" : "1",
          "TaxClass1" : 1,
          "TaxAmount1" : 5,
          "GLAccount" : "00000000000000",
          "DistributedAmount" : 100,
          "DestinationDescription" : "Description"
        }
      ]
    }
  ],
  "UpdateOperation" : "Insert"
}