在三列上汇总,然后在不同条件下汇总

时间:2018-07-16 06:22:58

标签: python pandas dataframe pandas-groupby

我的测试数据框:

print("Create List",  end='\n') 
Test_Data = [('display_name', ['A', 'B', 'B','C','C','C','C','C',]),
         ('security_type1', ['GOVT', 'CORP','CORP','CORP','CORP','CORP','CORP','CORP']),
         ('currency_str', ['USD', 'NZD','USD','EUR','EUR','GBP','GBP','USD']),
         ('state', ['Done','Passed','Done','Done','Traded Away','Done','Done','Done']),
         ('rfq_qty_CAD_Equiv', [100000, 100000, 100000,100000,100000,100000,100000,100000]),
         ]
dfTest_Data = pd.DataFrame.from_items(Test_Data)
display(dfTest_Data)

display_name    security_type1  currency_str    state   rfq_qty_CAD_Equiv
A                     GOVT          USD          Done         100000
B                     CORP          NZD          Passed       100000
B                     CORP          USD          Done         100000
C                     CORP          EUR          Done         100000
C                     CORP          EUR          Traded Away  100000
C                     CORP          GBP          Done         100000
C                     CORP          GBP          Done         100000
C                     CORP          USD          Done         100000

下面是我想要的输出。此处的驱动程序按display_namesecurity_type1currency_str分组。 Total_RFQTotal_RFQ_Volume相对于display_name

display_name    security_type1  currency_str    Done_RFQ    Not_Done_RFQ    Total_RFQ
      A               GOVT           USD            1             0          1  
      B               CORP           USD            1             1          2
      C               CORP           EUR            1             1          5
      C               CORP           GBP            2             0          5
      C               CORP           USD            1             0          5

Hit_Rate    Done_RFQ_Volume     Not_Done_RFQ_Volume     Total_RFQ_Volume
1.00             100000                 0                    100000              
0.50             100000               100000                 200000 
0.20             100000               100000                 500000 
0.40             200000                 0                    500000 
0.20             100000                 0                    500000 


Volume_per_Done_RFQ           Volume_per_Not_Done_RFQ   Volume_per_Total_RFQ
    100000                              0                  100000
    100000                           100000                100000
    100000                           100000                100000
    100000                              0                  100000
    100000                              0                  100000

Hit_Rate = Done_RFQ / Total_RFQ

Volume_per_Done_RFQ = Done_RFQ_Volume / Done_RFQ

Volume_per_Not_Done_RFQ = Not_Done_RFQ_Volume / Not_Done_RFQ

Volume_per_Total_RFQ = Total_RFQ_Volume / Total_RFQ

大部分工作已经完成,我只是在合并第三个数据框和显示需要输入零项的Not_Done订单项时遇到问题

print("All Trades",  end='\n') 
d = [
     ('Total_RFQ_Volume', 'sum'), 
     ('Total_RFQ', 'size'), 
    ]
df1 = dfTest_Data.groupby(['display_name'])['rfq_qty_CAD_Equiv'].agg(d)
display (df1)

print("Done Trades",  end='\n') 
d = [
     ('Done_RFQ_Volume', 'sum'), 
     ('Done_RFQ', 'size'), 
    ]
mask = dfTest_Data['state'].str.contains('Done')
df2 = dfTest_Data[mask].groupby(['display_name','security_type1','currency_str'])['rfq_qty_CAD_Equiv'].agg(d).reset_index()
display (df2)

print("Not Done Trades",  end='\n') 
d = [ 
    ('Not_Done_RFQ_Volume', 'sum'), 
    ('Not_Done_RFQ', 'size'), 
] 
mask = ~dfTest_Data['state'].str.contains('Done') 
df3 = dfTest_Data[mask].groupby(['display_name','security_type1','currency_str'])['rfq_qty_CAD_Equiv'] .agg(d) .reset_index()
display (df3)

print("Join Done trades on All Trades",  end='\n') 
df_Done_Client_Hit_Rate_Volume = df2.join(df1, on='display_name').join(df3, on='display_name')
# Create additional calculated columns
df_Done_Client_Hit_Rate_Volume['Hit_Rate'] = df_Done_Client_Hit_Rate_Volume['Done_RFQ'] / df_Done_Client_Hit_Rate_Volume['Total_RFQ'] 
df_Done_Client_Hit_Rate_Volume['Volume_per_Done_RFQ'] = df_Done_Client_Hit_Rate_Volume['Done_RFQ_Volume'] / df_Done_Client_Hit_Rate_Volume['Done_RFQ'] 
df_Done_Client_Hit_Rate_Volume['Volume_per_Not_Done_RFQ'] = df_Done_Client_Hit_Rate_Volume['Not_Done_RFQ_Volume'] / df_Done_Client_Hit_Rate_Volume['Not_Done_RFQ'] 
df_Done_Client_Hit_Rate_Volume['Volume_per_Total_RFQ'] = df_Done_Client_Hit_Rate_Volume['Total_RFQ_Volume'] / df_Done_Client_Hit_Rate_Volume['Total_RFQ'] 
# Reorder columns
df_Done_Client_Hit_Rate_Volume = df_Done_Client_Hit_Rate_Volume[['display_name', 
                                                                 'security_type1',
                                                                 'currency_str', 
                                                                 'Done_RFQ',
                                                                 'Not_Done_RFQ',
                                                                 'Total_RFQ',
                                                                 'Hit_Rate',
                                                                 'Done_RFQ_Volume',
                                                                 'Not_Done_RFQ_Volume',
                                                                 'Volume_per_Done_RFQ',
                                                                 'Volume_per_Not_Done_RFQ',
                                                                 'Total_RFQ_Volume'
                                                                 'Volume_per_Total_RFQ'
                                                                ]]
display (df_Done_Client_Hit_Rate_Volume)

1 个答案:

答案 0 :(得分:2)

我认为需要先删除public class TaskRace extends Application { private final ListView<String> listView = new ListView<>(); private final Label label = new Label("Nothing selected"); private final SingleTaskRunner runner = new SingleTaskRunner(); private final long startMillis = System.currentTimeMillis(); public static void main(String[] args) { launch(args); } @Override public void start(Stage stage) { // Simple UI VBox root = new VBox(5); root.setAlignment(Pos.CENTER); root.setPadding(new Insets(10)); root.getChildren().addAll(listView, label); // Populate the ListView listView.getItems().addAll( "One", "Two", "Three", "Four", "Five" ); // Add listener to the ListView to start the task whenever an item is selected listView.getSelectionModel().selectedItemProperty().addListener((observableValue, oldValue, newValue) -> { if (newValue != null) { // Create the background task MyTask task = new MyTask(); // Update the label when the task is completed task.setOnSucceeded(event -> { label.setText(task.getValue()); println("Assigned " + task.selectedItem); }); task.setOnCancelled(event -> println("Cancelled " + task.selectedItem)); runner.runTask(task); } }); stage.setScene(new Scene(root)); stage.show(); } private void println(String string) { System.out.format("%5.2fs: %s%n", 0.001 * (System.currentTimeMillis() - startMillis), string); } private class MyTask extends Task<String> { final String selectedItem = listView.getSelectionModel().getSelectedItem(); @Override protected String call() { int ms = new Random().nextInt(10000); println(String.format("Will return %s in %.2fs", selectedItem, 0.001 * ms)); // Do long-running task (takes random time) long limitMillis = System.currentTimeMillis() + ms; while (System.currentTimeMillis() < limitMillis) { } println("Returned " + selectedItem); return "You have selected item: " + selectedItem; } } } .reset_index的{​​{1}}:

df2

然后通过concatdf3d = [ ('Done_RFQ_Volume', 'sum'), ('Done_RFQ', 'size'), ] mask = dfTest_Data['state'].str.contains('Done') df2 = dfTest_Data[mask].groupby(['display_name','security_type1','currency_str'])['rfq_qty_CAD_Equiv'].agg(d) #print (df2) print("Not Done Trades", end='\n') d = [ ('Not_Done_RFQ_Volume', 'sum'), ('Not_Done_RFQ', 'size'), ] mask = ~dfTest_Data['state'].str.contains('Done') df3 = dfTest_Data[mask].groupby(['display_name','security_type1','currency_str'])['rfq_qty_CAD_Equiv'].agg(d) 结合在一起:

DataFrame

join

最后用df = pd.concat([df2, df3],axis=1).reset_index() df_Done_Client_Hit_Rate_Volume = df.join(df1, on='display_name') 替换缺少的值:

df_Done_Client_Hit_Rate_Volume['Hit_Rate'] = df_Done_Client_Hit_Rate_Volume['Done_RFQ'] / df_Done_Client_Hit_Rate_Volume['Total_RFQ'] 
df_Done_Client_Hit_Rate_Volume['Volume_per_Done_RFQ'] = df_Done_Client_Hit_Rate_Volume['Done_RFQ_Volume'] / df_Done_Client_Hit_Rate_Volume['Done_RFQ'] 
df_Done_Client_Hit_Rate_Volume['Volume_per_Not_Done_RFQ'] = df_Done_Client_Hit_Rate_Volume['Not_Done_RFQ_Volume'] / df_Done_Client_Hit_Rate_Volume['Not_Done_RFQ'] 
df_Done_Client_Hit_Rate_Volume['Volume_per_Total_RFQ'] = df_Done_Client_Hit_Rate_Volume['Total_RFQ_Volume'] / df_Done_Client_Hit_Rate_Volume['Total_RFQ']