修改:明确了问题
我想通过“标识符”聚合一个名为 df 的pd.DataFrame并汇总“费用”列。对于类别列,我想应用一个可以大声说出来的聚合函数,例如“聚合并采用该列的最频繁值(模式),但是如果该模式为空白,则采用第二最频繁的列”。换句话说,我想拥有类别的模式(聚合之后),但是模式不能为空。
结果应为pd.DataFrame new_df 。
df
Identifier Cost Cost2 Category1 Category2 Category3
0 A 10 10 one aaa
1 A 20 10 blue aaa
2 B 10 20 two bbb
3 B 10 30 green bbb
4 B 30 40 bbb
5 C 20 50 three red ccc
---聚合过程--->
new_df
Identifier Cost Cost2 Category1 Category2 Category3
0 A 30 20 one blue aaa
1 B 50 90 two green bbb
2 C 20 50 three red ccc
用于重现示例的代码:
import pandas as pd
data_df = {
'Identifier': ['A', 'A', 'B', 'B', 'B', 'C'],
'Cost': [10, 20, 10, 10, 30, 20],
'Cost2':[10,10,20,30,40,50],
'Category1' : ['one', '', 'two', '', '', 'three'],
'Category2' : ['', 'blue', '', 'green', '', 'red'],
'Category3' : ['aaa', 'aaa', 'bbb', 'bbb', 'bbb', 'ccc']
}
df = pd.DataFrame(data_df)
data_new_df = {
'Identifier': ['A', 'B', 'C'],
'Cost': [30, 50, 20],
'Cost2' : [20,90,50],
'Category1' : ['one', 'two', 'three'],
'Category2' : ['blue', 'green', 'red'],
'Category3' : ['aaa', 'bbb', 'ccc']
}
new_df = pd.DataFrame(data_new_df)
答案 0 :(得分:1)
也许您可以将groupby
与sum
一起尝试以下操作:
new_df = df.groupby('Identifier').apply(sum).drop('Identifier', axis=1).reset_index()
结果:
Identifier Cost Category1 Category2
0 A 30 one blue
1 B 50 two green
2 C 20 three red
答案 1 :(得分:0)
您可以尝试:
Net net = readNet("e://xor.pb");
const int sizes[] = {1,1,60,162};
Mat tenz = Mat::zeros(4, sizes, CV_32F);
float* dataB=(float*)tenz.data;
int x=1;
int y=2;
dataB[y*tenz.size[2]+x]=0.5f;
x=1;
y=3;
dataB[y*tenz.size[2]+x]=1.0f;
try
{
net.setInput(tenz , "input_layer_my_input_1");
Mat prob = net.forward("output_layer_my/MatMul");
}
catch( cv::Exception& e )
{
const char* err_msg = e.what();
qDebug()<<"err_msg"<<err_msg;
}
结果:
new_df = df.groupby('Identifier').sum().reset_index()
new_df['Category1'] = df.loc[df.Category1 != '', 'Category1'].reset_index(drop=True)
new_df['Category2'] = df.loc[df.Category2 != '', 'Category2'].reset_index(drop=True)
new_df