我试图通过使用不同类型的函数和参数值来减少pandas数据帧中的数据。但是,我没有设法更改聚合函数中的默认参数。这是一个例子:
>>> df = pd.DataFrame({'x': [1,np.nan,2,1],
... 'y': ['a','a','b','b']})
>>> df
x y
0 1.0 a
1 NaN a
2 2.0 b
3 1.0 b
这是一个聚合函数,我想测试b
的不同值:
>>> def translate_mean(x, b=10):
... y = [elem + b for elem in x]
... return np.mean(y)
在下面的代码中,我可以将此函数用于默认的b
值,但我想传递其他值:
>>> df.groupby('y').agg(translate_mean)
x
y
a NaN
b 11.5
有什么想法吗?
答案 0 :(得分:5)
在这种情况下,您可以尝试使用Sub CopySheet()
Dim NewSheet As String
Dim PrevSheet As String
Dim CashWS As Worksheet
Dim MonthVal As String
NewSheet = InputBox("Which month is this Commissions statement for?")
PrevSheet = InputBox("What was the previous month?")
Worksheets(PrevSheet).Copy After:=Worksheets("Summary")
ActiveSheet.Name = NewSheet
Range("D2").Select
ActiveCell.FormulaR1C1 = "=EOMONTH(DATE(2017,MONTH(DATEVALUE(MID(CELL(""filename"", RC[-5]), FIND(""]"", CELL(""filename"", RC[-5])) + 1, 255)&""1"")+1),1),0)"
Selection.NumberFormat = "m/d/yyyy"
Range("D3").Select
ActiveCell.FormulaR1C1 = "=MONTH(R[-1]C)"
Range("D3").Select
Selection.NumberFormat = "General"
MonthVal = ActiveCell.Value
Set CashWS = Sheets.Add
Sheets.Add.Name = "2017_0" & MonthVal & " Cash"
End Sub
:
apply
现在的结果是:
df.groupby('y').apply(lambda x: translate_mean(x['x'], 20))
答案 1 :(得分:5)
只需将参数传递给Private Sub CmdTemp_Click(sender As Object, e As EventArgs) Handles CmdTemp.Click
Dim sz As String
With TxtTemp
If IsNumeric(.Text) Then
If Val(.Text) >= 0 Then
sz = "Update ProdSheet set Temp = " & TxtCheck.Text & " "
sz = sz & "Where Prod = @!Prod "
LoggedDBExecute(conn, sz)
MsgBox("Worked")
End If
Else
MsgBox("Please enter a numeric value!")
End If
End With
End Sub
(这也适用于agg
)。
apply
答案 2 :(得分:0)
只要您有多个列,并且要为每个列应用不同的函数和不同的参数,就可以将lambda函数与agg函数一起使用。 例如:
>>> df = pd.DataFrame({'x': [1,np.nan,2,1],
... 'y': ['a','a','b','b']
'z': ['0.1','0.2','0.3','0.4']})
>>> df
x y z
0 1.0 a 0.1
1 NaN a 0.2
2 2.0 b 0.3
3 1.0 0.4
>>> def translate_mean(x, b=10):
... y = [elem + b for elem in x]
... return np.mean(y)
要对“ y”列进行分组,并为col“ x”应用b = 10的函数translate_mean; b = 25代表“ z”,您可以尝试以下方法:
df_res = df.groupby(by='a').agg({
'x': lambda x: translate_mean(x, 10),
'z': lambda x: translate_mean(x, 25)})
希望它会有所帮助。