带关联数组的awk代码 - 数组似乎没有填充,但没有错误

时间:2014-09-28 16:22:01

标签: awk associative-array

  1. 问题:为什么在下面的代码段中似​​乎没有填充date_list [d]和isin_list [i]?

  2. AWK代码(在Win-7机器上的GNU-AWK上)

    BEGIN { FS = "," } # This SEBI  data set has comma-separated fields (NSE snapshots are pipe-separated)
    
    # UPDATE the lists for DATE ($10), firm_ISIN ($9), EXCHANGE ($12), and FII_ID ($5).
    ( $17~/_EQ\>/ )    {
        if (date[$10]++ == 0) date_list[d++] = $10;   # Dates appear in order in raw data
        if (isin[$9]++ == 0) isin_list[i++] = $9;     # ISINs appear out of order in raw data
        print $10, date[$10], $9, isin[$9], date_list[d], d, isin_list[i], i 
    }
    
  3. 输入数据

    49290,C198962542782200306,6/30/2003,433581,F5811773991200306,S5405611832200306,B5086397478200306,NESTLE INDIA LTD.,INE239A01016,6/27/2003,1,E9035083824200306,REG_DL_STLD_02,591.13,5655,3342840.15,REG_DL_INSTR_EQ,REG_DL_DLAY_P,DL_RPT_TYPE_N,DL_AMDMNT_DEL_00
    49291,C198962542782200306,6/30/2003,433563,F6292896459200306,S6344227311200306,B6110521493200306,GRASIM INDUSTRIES LTD.,INE047A01013,6/27/2003,1,E9035083824200306,REG_DL_STLD_02,495.33,3700,1832721,REG_DL_INSTR_EQ,REG_DL_DLAY_P,DL_RPT_TYPE_N,DL_AMDMNT_DEL_00
    49292,C198962542782200306,6/30/2003,433681,F6513202607200306,S1724027402200306,B6372023178200306,HDFC BANK LTD,INE040A01018,6/26/2003,1,E745964372424200306,REG_DL_STLD_02,242,2600,629200,REG_DL_INSTR_EQ,REG_DL_DLAY_D,DL_RPT_TYPE_N,DL_AMDMNT_DEL_00
    49293,C7885768925200306,6/30/2003,48128,F4406661052200306,S7376401565200306,B4576522576200306,Maruti Udyog Limited,INE585B01010,6/28/2003,3,E912851176274200306,REG_DL_STLD_04,125,44600,5575000,REG_DL_INSTR_EQ,REG_DL_DLAY_P,DL_RPT_TYPE_N,DL_AMDMNT_DEL_00
    49294,C7885768925200306,6/30/2003,48129,F4500260787200306,S1312094035200306,B4576522576200306,Maruti Udyog Limited,INE585B01010,6/28/2003,4,E912851176274200306,REG_DL_STLD_04,125,445600,55700000,REG_DL_INSTR_EQ,REG_DL_DLAY_P,DL_RPT_TYPE_N,DL_AMDMNT_DEL_00
    49295,C7885768925200306,6/30/2003,48130,F6425024637200306,S2872499118200306,B4576522576200306,Maruti Udyog Limited,INE585B01010,6/28/2003,3,E912851176274200306,REG_DL_STLD_04,125,48000,6000000,REG_DL_INSTR_EU,REG_DL_DLAY_P,DL_RPT_TYPE_N,DL_AMDMNT_DEL_00
    
  4. 我得到的输出

    6/27/2003 1 INE239A01016 1  1  1
    6/27/2003 2 INE047A01013 1  1  2
    6/26/2003 1 INE040A01018 1  2  3
    6/28/2003 1 INE585B01010 1  3  4
    6/28/2003 2 INE585B01010 2  3  4
    
  5. 预期输出

    据我所知,印刷品正确打印出来(i)10美元(日期)(ii)日期[10美元],每个日期的计数(iii)9美元(公司ID称为ISIN)(iv) )isin [$ 9],每个ISIN(v)d的计数(date_list的索引,唯一日期的数量)和(vi)i(isin_list的索引,唯一ISIN的数量)。对于date_list [d]和isin_list [i],我还应该再获得两列 - 第5列和第7列 - 其值为$ 10和$ 9。

    6/27/2003 1 INE239A01016 1  6/27/2003 1 INE239A01016  1
    6/27/2003 2 INE047A01013 1  6/27/2003 1 INE047A01013  2
    6/26/2003 1 INE040A01018 1  6/26/2003 2 INE040A01018  3
    6/28/2003 1 INE585B01010 1  6/28/2003 3 INE585B01010  4
    6/28/2003 2 INE585B01010 2  6/28/2003 3 INE585B01010  4
    

1 个答案:

答案 0 :(得分:0)

我现在使用的实际代码是

{    if (date[$10]++ == 0) date_list[d++] = $10;                 
     if (isin[$9]++ == 0) isin_list[i++] = $9;}        
( $11~/1|2|3|5|9|1[24]/ )) { ++BNR[$10,$9,$12,$5]}         
END { { for (u = 0; u < d; u++)  
      {for (v = 0; v < i; v++) 
      {    if  (BNR[date_list[u],isin_list[v]]>0) 
               BR=BNR[date_list[u],isin_list[v]] 
              { print(date_list[u], isin_list[v], BR}}}}}  

非常感谢大家。