我的测试数据框:
print("Create List", end='\n')
Test_Data = [('display_name', ['A', 'B', 'B','C','C','C','C','C',]),
('security_type1', ['GOVT', 'CORP','CORP','CORP','CORP','CORP','CORP','CORP']),
('currency_str', ['USD', 'NZD','USD','EUR','EUR','GBP','GBP','USD']),
('state', ['Done','Passed','Done','Done','Traded Away','Done','Done','Done']),
('rfq_qty_CAD_Equiv', [100000, 100000, 100000,100000,100000,100000,100000,100000]),
]
dfTest_Data = pd.DataFrame.from_items(Test_Data)
display(dfTest_Data)
display_name security_type1 currency_str state rfq_qty_CAD_Equiv
A GOVT USD Done 100000
B CORP NZD Passed 100000
B CORP USD Done 100000
C CORP EUR Done 100000
C CORP EUR Traded Away 100000
C CORP GBP Done 100000
C CORP GBP Done 100000
C CORP USD Done 100000
下面是我想要的输出。此处的驱动程序按display_name
,security_type1
和currency_str
分组。 Total_RFQ
和Total_RFQ_Volume
相对于display_name
display_name security_type1 currency_str Done_RFQ Not_Done_RFQ Total_RFQ
A GOVT USD 1 0 1
B CORP USD 1 1 2
C CORP EUR 1 1 5
C CORP GBP 2 0 5
C CORP USD 1 0 5
Hit_Rate Done_RFQ_Volume Not_Done_RFQ_Volume Total_RFQ_Volume
1.00 100000 0 100000
0.50 100000 100000 200000
0.20 100000 100000 500000
0.40 200000 0 500000
0.20 100000 0 500000
Volume_per_Done_RFQ Volume_per_Not_Done_RFQ Volume_per_Total_RFQ
100000 0 100000
100000 100000 100000
100000 100000 100000
100000 0 100000
100000 0 100000
Hit_Rate
= Done_RFQ
/ Total_RFQ
Volume_per_Done_RFQ
= Done_RFQ_Volume
/ Done_RFQ
Volume_per_Not_Done_RFQ
= Not_Done_RFQ_Volume
/ Not_Done_RFQ
Volume_per_Total_RFQ
= Total_RFQ_Volume
/ Total_RFQ
大部分工作已经完成,我只是在合并第三个数据框和显示需要输入零项的Not_Done订单项时遇到问题
print("All Trades", end='\n')
d = [
('Total_RFQ_Volume', 'sum'),
('Total_RFQ', 'size'),
]
df1 = dfTest_Data.groupby(['display_name'])['rfq_qty_CAD_Equiv'].agg(d)
display (df1)
print("Done Trades", end='\n')
d = [
('Done_RFQ_Volume', 'sum'),
('Done_RFQ', 'size'),
]
mask = dfTest_Data['state'].str.contains('Done')
df2 = dfTest_Data[mask].groupby(['display_name','security_type1','currency_str'])['rfq_qty_CAD_Equiv'].agg(d).reset_index()
display (df2)
print("Not Done Trades", end='\n')
d = [
('Not_Done_RFQ_Volume', 'sum'),
('Not_Done_RFQ', 'size'),
]
mask = ~dfTest_Data['state'].str.contains('Done')
df3 = dfTest_Data[mask].groupby(['display_name','security_type1','currency_str'])['rfq_qty_CAD_Equiv'] .agg(d) .reset_index()
display (df3)
print("Join Done trades on All Trades", end='\n')
df_Done_Client_Hit_Rate_Volume = df2.join(df1, on='display_name').join(df3, on='display_name')
# Create additional calculated columns
df_Done_Client_Hit_Rate_Volume['Hit_Rate'] = df_Done_Client_Hit_Rate_Volume['Done_RFQ'] / df_Done_Client_Hit_Rate_Volume['Total_RFQ']
df_Done_Client_Hit_Rate_Volume['Volume_per_Done_RFQ'] = df_Done_Client_Hit_Rate_Volume['Done_RFQ_Volume'] / df_Done_Client_Hit_Rate_Volume['Done_RFQ']
df_Done_Client_Hit_Rate_Volume['Volume_per_Not_Done_RFQ'] = df_Done_Client_Hit_Rate_Volume['Not_Done_RFQ_Volume'] / df_Done_Client_Hit_Rate_Volume['Not_Done_RFQ']
df_Done_Client_Hit_Rate_Volume['Volume_per_Total_RFQ'] = df_Done_Client_Hit_Rate_Volume['Total_RFQ_Volume'] / df_Done_Client_Hit_Rate_Volume['Total_RFQ']
# Reorder columns
df_Done_Client_Hit_Rate_Volume = df_Done_Client_Hit_Rate_Volume[['display_name',
'security_type1',
'currency_str',
'Done_RFQ',
'Not_Done_RFQ',
'Total_RFQ',
'Hit_Rate',
'Done_RFQ_Volume',
'Not_Done_RFQ_Volume',
'Volume_per_Done_RFQ',
'Volume_per_Not_Done_RFQ',
'Total_RFQ_Volume'
'Volume_per_Total_RFQ'
]]
display (df_Done_Client_Hit_Rate_Volume)
答案 0 :(得分:2)
我认为需要先删除public class TaskRace extends Application {
private final ListView<String> listView = new ListView<>();
private final Label label = new Label("Nothing selected");
private final SingleTaskRunner runner = new SingleTaskRunner();
private final long startMillis = System.currentTimeMillis();
public static void main(String[] args) {
launch(args);
}
@Override
public void start(Stage stage) {
// Simple UI
VBox root = new VBox(5);
root.setAlignment(Pos.CENTER);
root.setPadding(new Insets(10));
root.getChildren().addAll(listView, label);
// Populate the ListView
listView.getItems().addAll(
"One", "Two", "Three", "Four", "Five"
);
// Add listener to the ListView to start the task whenever an item is selected
listView.getSelectionModel().selectedItemProperty().addListener((observableValue, oldValue, newValue) -> {
if (newValue != null) {
// Create the background task
MyTask task = new MyTask();
// Update the label when the task is completed
task.setOnSucceeded(event -> {
label.setText(task.getValue());
println("Assigned " + task.selectedItem);
});
task.setOnCancelled(event -> println("Cancelled " + task.selectedItem));
runner.runTask(task);
}
});
stage.setScene(new Scene(root));
stage.show();
}
private void println(String string) {
System.out.format("%5.2fs: %s%n", 0.001 * (System.currentTimeMillis() - startMillis), string);
}
private class MyTask extends Task<String> {
final String selectedItem = listView.getSelectionModel().getSelectedItem();
@Override
protected String call() {
int ms = new Random().nextInt(10000);
println(String.format("Will return %s in %.2fs", selectedItem, 0.001 * ms));
// Do long-running task (takes random time)
long limitMillis = System.currentTimeMillis() + ms;
while (System.currentTimeMillis() < limitMillis) {
}
println("Returned " + selectedItem);
return "You have selected item: " + selectedItem;
}
}
}
和.reset_index
的{{1}}:
df2
然后通过concat
和df3
将d = [
('Done_RFQ_Volume', 'sum'),
('Done_RFQ', 'size'),
]
mask = dfTest_Data['state'].str.contains('Done')
df2 = dfTest_Data[mask].groupby(['display_name','security_type1','currency_str'])['rfq_qty_CAD_Equiv'].agg(d)
#print (df2)
print("Not Done Trades", end='\n')
d = [
('Not_Done_RFQ_Volume', 'sum'),
('Not_Done_RFQ', 'size'),
]
mask = ~dfTest_Data['state'].str.contains('Done')
df3 = dfTest_Data[mask].groupby(['display_name','security_type1','currency_str'])['rfq_qty_CAD_Equiv'].agg(d)
结合在一起:
DataFrame
join
最后用df = pd.concat([df2, df3],axis=1).reset_index()
df_Done_Client_Hit_Rate_Volume = df.join(df1, on='display_name')
替换缺少的值:
df_Done_Client_Hit_Rate_Volume['Hit_Rate'] = df_Done_Client_Hit_Rate_Volume['Done_RFQ'] / df_Done_Client_Hit_Rate_Volume['Total_RFQ']
df_Done_Client_Hit_Rate_Volume['Volume_per_Done_RFQ'] = df_Done_Client_Hit_Rate_Volume['Done_RFQ_Volume'] / df_Done_Client_Hit_Rate_Volume['Done_RFQ']
df_Done_Client_Hit_Rate_Volume['Volume_per_Not_Done_RFQ'] = df_Done_Client_Hit_Rate_Volume['Not_Done_RFQ_Volume'] / df_Done_Client_Hit_Rate_Volume['Not_Done_RFQ']
df_Done_Client_Hit_Rate_Volume['Volume_per_Total_RFQ'] = df_Done_Client_Hit_Rate_Volume['Total_RFQ_Volume'] / df_Done_Client_Hit_Rate_Volume['Total_RFQ']