Pandas的pandas.DataFrame.diff
几乎可以完成我想做的事情。
>>> df = pd.DataFrame({'a': [1, 2, 3, 4, 5, 6],
... 'b': [1, 1, 2, 3, 5, 8],
... 'c': [1, 4, 9, 16, 25, 36]})
>>> df
a b c
0 1 1 1
1 2 1 4
2 3 2 9
3 4 3 16
4 5 5 25
5 6 8 36
df.diff(axis=0)
和df.diff(axis=1)
分别产生
>>> df.diff()
a b c
0 NaN NaN NaN
1 1.0 0.0 3.0
2 1.0 1.0 5.0
3 1.0 1.0 7.0
4 1.0 2.0 9.0
5 1.0 3.0 11.0
>>> df.diff(axis=1)
a b c
0 NaN 0.0 0.0
1 NaN -1.0 3.0
2 NaN -1.0 7.0
3 NaN -1.0 13.0
4 NaN 0.0 20.0
5 NaN 2.0 28.0
df.diff
所做的本质上就是应用此功能
def diff_func(columns):
return columns[1:] - columns[0:-1]
我想定义自己的函数,以代替diff_func
。
我想要的是将自己的函数(可能是非线性的)“应用于”连续的(periods=1
)列/行。例如,func(x,y) = sin(x)*cos(y)
,其中x,y
是periods=n
答案 0 :(得分:3)
您应该考虑shift
df-df.shift(1)
a b c
0 NaN NaN NaN
1 1.0 0.0 3.0
2 1.0 1.0 5.0
3 1.0 1.0 7.0
4 1.0 2.0 9.0
5 1.0 3.0 11.0
答案 1 :(得分:1)
这是使用pandas DataFrame内置方法“ apply”为我工作的一种方法。
我在product
中包装了自定义diff函数diff_custom
,该函数接受两个参数并返回一个值,而diff_custom
则接受一个向量参数并返回等长向量,并适当地进行了NaN填充。我们使用pandas DataFrame的内置apply
方法执行import pandas
import numpy
df = pandas.DataFrame([[4, 9],[5,10],[22,44]], columns=['A', 'B'])
# Function that acts on neighboring row or column values, val1 and val2
def product(val1,val2):
return(val1*val2)
# Function that acts on DataFrame row or column, x
def diff_custom(x):
vals = [product(x[n],x[n+1]) for n in range(x.shape[0]-1)]
ret = list([numpy.nan]) # pad return vector however you need to
ret = ret + vals
return(ret)
# Use DataFrame built-in 'apply' method
df.apply(diff_custom,axis=1)
A B
0 NaN 36.0
1 NaN 50.0
2 NaN 968.0
df.apply(diff_custom,axis=0)
A B
0 NaN NaN
1 20.0 90.0
2 110.0 440.0
:
{
"Invoices" : [
{
"VendorNumber" : "vendorA",
"DocumentNumber" : "1234",
"DocumentType" : "Invoice",
"DocumentDate" : "2019-09-06T06:00:00.000Z",
"DueDate" : "2019-10-06T06:00:00.000Z",
"Taxable" : "1",
"TaxAmountControl" : "0",
"TaxGroup" : "GST",
"TaxAuthority1" : "GST",
"TaxClass1" : 1,
"TaxAmount1" : 5,
"DocumentTotalIncludingTax" : 105,
"InvoiceDetails" : [
{
"ManualTaxEntry" : "1",
"TaxClass1" : 1,
"TaxAmount1" : 5,
"GLAccount" : "00000000000000",
"DistributedAmount" : 100,
"DestinationDescription" : "Description"
}
]
}
],
"UpdateOperation" : "Insert"
}