Pandas或Numpy,搜索,编辑和提取数据

时间:2016-08-17 13:59:39

标签: python pandas numpy scipy full-text-search

几周前,我开始学习python以编写程序,
目前我尝试从文件中提取一些信息,它看起来像这样:

MAEFILE: benopga1.01.mae
&gen 
igeopt=1 
iaccg=5 
ip160=4 
ifreq=1 
ip24=2 
nogas=2 
iacc=1 
&
&zmat
 C1             18.7209160000            18.5628090000            15.2914270000
 C2             17.4114700000            18.8454610000            15.7184360000
 C3             17.0539910000            18.6449570000            17.0518080000
 C4             18.0032320000            18.1601840000            17.9688000000
 C5             19.3024250000            17.8798070000            17.5452620000
 C6             19.6626680000            18.0820920000            16.2013130000
 H7             18.9985910000            18.7195810000            14.2522760000
 H8             16.6780090000            19.2198900000            15.0089150000
 H9             16.0417370000            18.8626870000            17.3826630000
 H10            17.7239350000            18.0037410000            19.0076080000
 H11            20.0369730000            17.5051670000            18.2535010000
 H12            20.6751990000            17.8633650000            15.8718730000
&
&guess  basgss=6-311g**+ numd=5
    1 Orbital Energy   -11.241317 Occupation  1.000000
 -0.237515017757232 -0.196647289792153 -0.000012299229451  0.000012283693667
 -0.000038832391585 -0.002068759436403  0.000019096186058  0.000035102037517
 -0.000178850142376  0.000386718554923  0.000028243495837 -0.000010189167830
  0.000009719189111 -0.000083379603107  0.000088703347921 -0.000005323744814
  0.000074406344067 -0.000034780134268  0.000042134744869 -0.000025018268285
  0.000005796046950  0.000032438198590 -0.000151512252705 -0.198843760947433
 -0.164626268788198 -0.000032620645254  0.000011142490285 -0.000007071762982
 -0.001751134173474 -0.000117771869441  0.000056795470271 -0.000099889624378
  0.000355125910896  0.000016162939390 -0.000000591018861 -0.000018504072314
 -0.000043258886307  0.000085387104771 -0.000042128218464  0.000032711718420
  0.000062742388095  0.000018426112961 -0.000001087454691 -0.000087393288567
  0.000036394574524 -0.000046738022296 -0.189838829586922 -0.157171079929277
 -0.000027575467355  0.000006987237905  0.000003880133365 -0.001671553419738
 -0.000145911512315  0.000031184688524  0.000049574180168  0.000356303208123
  0.000008457559748 -0.000002555314618  0.000000156292819  0.000001781216751
  0.000074959304234 -0.000076740520986  0.000034291328716 -0.000015539635327
  0.000048978495218  0.000007568809345 -0.000078160169234  0.000017433925123
  0.000021965260595 -0.206688115052246 -0.171120694083044 -0.000024613036749
  0.000000578071180  0.000028857510870 -0.001816148417123 -0.000060499311777
 -0.000017129060142  0.000151731576240  0.000366295601898  0.000027273496960
 -0.000008681030719  0.000003505182219 -0.000084713189897  0.000079444547592
  0.000005268642305  0.000061929650436 -0.000000945876945  0.000025397622782
 -0.000006331576449 -0.000059596946982 -0.000005577539402  0.000100798841928
 -0.249803952550950 -0.206823550176572  0.000003194386460 -0.000011257593014
  0.000044548734829 -0.002165674347528  0.000119312075990 -0.000063891648135
  0.000128577103429  0.000383381470817  0.000015141171971  0.000000035916246
 -0.000018789517620 -0.000044930372579  0.000110215751912 -0.000065285379334
  0.000039159765842  0.000078659142665  0.000027379751909 -0.000029883760840
  0.000089805971123 -0.000055471940289  0.000127658675297 -0.282967668103962
 -0.234291216881311  0.000022041761553 -0.000002823143502 -0.000015098289029
 -0.002411282844625  0.000189683107712 -0.000041945582722 -0.000057757033880
  0.000410957267636 -0.000018584696026  0.000002703022996  0.000011405600689
 -0.000005600032267  0.000119091011558 -0.000113490979291  0.000058095889213
 -0.000027269439738  0.000076776523871 -0.000039022163445  0.000173859295776
 -0.000036395429255 -0.000059944178961 -0.000180764515503  0.000007292236479
 -0.000211164361974  0.000002900643801  0.000006787640262 -0.000032874030369
 -0.000152282839545  0.000001347403058 -0.000192590475337 -0.000022257991946
  0.000010326038114 -0.000016900157341 -0.000145389069678  0.000002334090915
 -0.000184253055893 -0.000026241454432  0.000005721872839  0.000008503996611
 -0.000157846225852  0.000005719915333 -0.000197522897687 -0.000012294624010
 -0.000002547402448  0.000026305673479 -0.000190277091678  0.000012467113833
 -0.000218567465104  0.000019225415536 -0.000011030299816  0.000023682450623
 -0.000214624610593  0.000020667490110 -0.000229289099730  0.000031900177357
 -0.000006901099326 -0.000010189424947

-------
   31 Orbital Energy     0.152723 Occupation  0.000000
 -0.000001484697707 -0.000002021678521 -0.011269736987441 -0.038830417095759
 -0.008858615646339  0.000001192390365 -0.006923508623065 -0.023842815223962
 -0.005418954995446  0.000041590787919 -0.067391763573696 -0.232026052460321
 -0.052998414149352  0.001793328350150  0.000775609630928 -0.002568937981078
  0.003688605910249 -0.001068780674560 -0.006386930085737 -0.000062943108962
  0.199687105918283  0.687684976534009  0.156868067056184  0.000000096154362
  0.000000657329257 -0.011210137098576 -0.038624144508054 -0.008813875928149
 -0.000002083821188 -0.006889998263354 -0.023739788458030 -0.005413447465109
 -0.000018259645012 -0.066955664071598 -0.230792413990635 -0.052580672751758
 -0.003071293835289  0.004370602232574 -0.001299308397285 -0.005372985400140
 -0.002350939654665 -0.002706808510529 -0.000070970422528  0.198318280915318
  0.683514266154416  0.155894609381509  0.000000339923235  0.000001176339340
  0.000304981657499  0.001042477771998  0.000239760902136 -0.000010467863819
  0.000544040658142  0.001845662740944  0.000423422866543  0.000056642800415
  0.000984897727369  0.003286180141148  0.000798961685623  0.000314520592607
 -0.001550492509766  0.001235971917159  0.000362328854063  0.001051946495198
  0.002924677897963  0.000052446885436  0.007592738586630  0.027102621203117
  0.006457030833373 -0.000001674148274 -0.000002569702830  0.011036680737231
  0.038036534353212  0.008683871836959  0.000011681514030  0.006437710430541
  0.022161408213113  0.005042525696551 -0.000073456543616  0.066825871686029
  0.230182549919537  0.052602199705660  0.001619905567722  0.001073249176964
 -0.002693154744685  0.003395235925365 -0.001233638134137 -0.006656299269408
 -0.000031022957851 -0.208382824428751 -0.718837764254747 -0.164641719576342
  0.000002139554361  0.000004317465847  0.010966267617752  0.037792975401876
  0.008624771875724 -0.000011263540804  0.006373597174822  0.021999897758008
  0.005010930775777 -0.000012041105775  0.066365092028384  0.228809219195351
  0.052127667273958 -0.003005259795779  0.004486267413809 -0.001481007618030
 -0.005217612687906 -0.002459044682008 -0.003155753438819  0.000122889172923
 -0.207289443420419 -0.714788889584271 -0.162678954342172 -0.000001113015636
 -0.000001539233415  0.000173647770863  0.000591026047632  0.000138309282669
  0.000001767411845  0.000457220121974  0.001573303489084  0.000360204439751
  0.000025047915999  0.000170009010859  0.000615396416029  0.000093891319707
  0.000532222151981 -0.001708952612925  0.001176730460945  0.000768838272606
  0.001104781032426  0.002758844326405 -0.000048320919224  0.010021022901790
  0.035244457521757  0.007959417545553 -0.000000432884305  0.000043865349832
 -0.000082946240504  0.001226683058023  0.004222389372921  0.000970314215579
 -0.000002541300496  0.000035298825687  0.000101681117207  0.001224755470991
  0.004192362461791  0.000964887073715 -0.000004352857236  0.000035780410422
 -0.000112972423150  0.000059020531435  0.000208544008111  0.000046874782483
  0.000000639601687  0.000054054494787  0.000026791342731 -0.001291902546627
 -0.004454899976684 -0.001026504045236 -0.000006323623792  0.000071261955447
 -0.000128521753658 -0.001293454307352 -0.004429301733713 -0.001017753159295
 -0.000001046054621  0.000011841859973  0.000073751920645  0.000075884640401
  0.000259775657980  0.000060453277266
&
&hess
   1
   1  5.357113E-01
   2 -1.203890E-01  2.050867E-01
   3 -5.072856E-02 -1.224648E-01  7.247686E-01
   4 -2.738767E-01  5.421034E-02  8.516300E-02  6.687076E-01
   5  5.862978E-02 -7.841398E-02 -6.986702E-03 -1.530589E-01  2.102981E-01
   6 -4.882459E-02  2.176720E-02 -1.508757E-01  3.490960E-02 -1.293015E-01
   7  1.397183E-02 -6.213820E-03  3.526196E-02 -1.359815E-01  3.130492E-03
   8 -2.648252E-02  1.165820E-02 -6.485242E-03  1.013400E-02 -7.653658E-02
   9  1.679309E-01 -3.235383E-02  4.798621E-03  6.244518E-03  5.348205E-02
  10 -1.499543E-01  3.053092E-02 -2.007584E-02  7.869050E-02 -3.394869E-03
  11  3.073586E-02 -1.093993E-02  1.498358E-03 -6.620322E-03  4.623322E-03
  12 -2.113264E-02  1.886220E-03 -3.178300E-03 -1.128905E-02  1.402985E-02
  13  1.358396E-01 -2.272879E-02 -9.113526E-03 -6.181379E-02  2.549849E-04
  14 -3.477661E-03  4.127380E-03  1.454523E-02 -3.514229E-04 -6.286993E-03
  15 -7.888047E-02  2.021552E-02 -3.514837E-02  1.840733E-02 -2.058056E-03
  16 -2.177345E-01  7.626153E-02 -1.207416E-01 -3.647403E-02  1.587059E-02
  17  5.247322E-02 -1.002916E-01  7.246352E-02  1.945406E-02  2.204342E-03
  18  2.208958E-02  4.326027E-02 -2.152098E-01 -7.205723E-02  3.178738E-03
  19 -7.725080E-02 -4.733359E-03  7.169356E-02 -1.952258E-03 -5.265493E-03
  20 -4.463521E-03 -4.582251E-02  4.403138E-02  1.752013E-03  3.237253E-03
  21  7.086791E-02  4.407103E-02 -3.203331E-01 -1.998051E-03  2.097618E-03
  22 -1.287577E-02  9.327437E-03 -2.157959E-02 -1.881858E-01  7.230259E-02
  23  2.190754E-03  3.617602E-04  4.639266E-03  7.195528E-02 -7.350281E-02
  24  8.569006E-03 -4.185411E-03  9.161599E-03 -1.241089E-01  6.861581E-02
  25  2.378996E-03  8.676100E-04  2.416760E-03 -4.030590E-03  2.507893E-03
  26  8.991614E-04  6.328556E-03  2.086578E-03 -4.549598E-03  4.186961E-03
  27  1.667829E-03  2.414920E-03 -5.566737E-03  2.914902E-02 -6.879962E-03
  28 -1.480521E-03  2.999713E-04 -7.027812E-04 -1.782413E-03  3.221656E-03
  29  4.484685E-04 -7.341012E-04 -1.860347E-05  3.571517E-03  5.682207E-03
  30 -7.047879E-04 -1.896539E-04  1.144798E-03 -3.791107E-03  2.806446E-03
  31  1.967935E-03  1.052172E-03  2.249519E-03 -1.359768E-04 -4.909666E-04
  32  9.264046E-04  6.619953E-03  2.006525E-03 -5.334520E-04 -4.168043E-04
  33  2.639645E-03  1.762322E-03 -4.896901E-03  1.372681E-03 -4.518228E-04
  34 -1.417455E-02  2.727599E-03  1.044303E-02 -1.172401E-03  3.258573E-03
  35  9.736697E-03  1.617405E-03 -4.499427E-03  3.110862E-03  5.445966E-03
  36 -2.049141E-02  4.551847E-03  9.548502E-03 -3.059820E-03  2.777782E-03
   6
   6  6.148960E-01
   7  9.694602E-02  6.987643E-01
   8  2.440824E-02 -1.503817E-01  2.054470E-01
   9 -2.512768E-01 -7.243872E-02 -8.990523E-02  5.654096E-01
  10 -1.333717E-01 -2.054959E-01  4.717798E-02  3.344556E-02  5.490687E-01
  11  3.904594E-02  7.187216E-02 -9.645819E-02  3.823231E-02 -1.192375E-01
  12 -5.054405E-02 -1.138194E-01  6.829035E-02 -2.021683E-01 -6.795284E-02
  13  1.102332E-01 -5.969411E-02  3.723179E-02 -1.236615E-01 -2.349015E-01
  14 -3.853905E-03  1.553892E-02  2.601743E-03  3.184075E-03  6.357285E-02
  15 -6.782885E-02 -6.726040E-03 -5.407333E-03  8.467873E-02  1.300986E-02
  16  3.943401E-03 -4.511242E-03  1.382113E-03 -2.390661E-02  1.119908E-02
  17 -2.094158E-02  1.540967E-03 -9.168102E-03  2.639755E-02 -2.632153E-02
  18  1.416749E-01 -2.532995E-02  2.683847E-02 -1.501283E-01  1.674068E-01
  19  2.808227E-02 -4.818105E-03  3.094910E-03  2.058231E-03 -1.745043E-03
  20 -6.667732E-03  2.870215E-03  6.262360E-03  5.683060E-04  4.427648E-04
  21 -5.175332E-03  2.554751E-03  4.579300E-04  1.979747E-03 -7.062836E-04
  22 -1.247529E-01  8.258119E-03 -3.432272E-03  8.252649E-03  2.144433E-03
  23  6.881531E-02  3.750150E-03  1.833843E-03  1.047128E-03  4.236422E-04
  24 -1.804521E-01 -2.291255E-02  1.036880E-02 -1.292029E-02  3.004725E-03
  25 -1.907067E-03 -3.063512E-01  5.888514E-02  8.248172E-02 -1.394043E-02
  26  2.208861E-03  5.925325E-02 -5.117732E-02 -1.285134E-02  9.154041E-03
  27 -3.215292E-03  8.194643E-02 -1.255761E-02 -8.551740E-02 -1.995845E-02
  28 -3.296838E-03  9.060765E-03  2.748152E-03 -2.084458E-02 -7.728528E-02
  29  2.687659E-03 -4.066922E-03  1.111791E-03  9.587301E-03 -4.752476E-03
  30 -7.099807E-04  8.960116E-03  1.031423E-03 -1.388900E-02  7.175752E-02
  31  1.691020E-03 -4.691846E-03  2.565325E-03  3.394495E-03 -1.317085E-02
  32 -4.306708E-04  2.862186E-03  6.076609E-03  2.204974E-04  2.751124E-03
  33 -5.486880E-04  2.398883E-03  4.941632E-04  1.520535E-03  8.765089E-03
  34 -3.900191E-03  1.185306E-03 -4.557608E-04 -8.504399E-04  2.324262E-03
  35  3.027231E-03 -4.660645E-04 -6.058911E-04  5.406380E-04  1.162152E-03
  36 -1.076955E-03 -6.065414E-04  3.749918E-04 -1.651439E-03  1.559582E-03
  11
  11  2.036463E-01
  12 -1.172782E-01  7.186272E-01
  13  3.836605E-02  9.833300E-02  6.189527E-01
  14 -8.124822E-02 -7.237159E-03 -1.586210E-01  2.107532E-01
  15  2.604649E-03 -1.440396E-01  4.110175E-02 -1.227296E-01  6.810652E-01
  16 -4.998799E-03  3.558635E-02 -1.517975E-01  2.806294E-03  7.703362E-02
  17  1.156440E-02 -7.570831E-03  2.908585E-02 -7.398307E-02  3.881048E-02
  18 -3.234759E-02  6.717726E-03 -5.807469E-02  4.951787E-02 -2.832498E-01
  19  3.589195E-04 -4.708646E-04 -1.402206E-03  3.220279E-03 -3.065811E-03
  20 -6.839957E-04 -2.250234E-04  3.431037E-03  5.450787E-03  2.657729E-03
  21 -2.501729E-04  1.105759E-03 -3.591744E-03  2.836891E-03 -9.231576E-04
  22  9.591988E-04  2.154412E-03 -1.186699E-03 -4.080684E-04  1.491628E-03
  23  6.348418E-03  2.082719E-03 -5.469945E-05 -4.309774E-04 -5.945081E-04
  24  1.834672E-03 -5.234314E-03  1.275562E-03 -5.273805E-04 -1.623391E-04
  25  2.339952E-03  1.011159E-02 -6.364443E-04  3.209197E-03 -3.830418E-03
  26  5.936686E-04 -4.692742E-03  3.085246E-03  5.692126E-03  3.099157E-03
  27  4.112008E-03  9.641439E-03 -3.333572E-03  2.835538E-03 -1.490121E-03
  28 -4.639572E-03  7.112198E-02 -2.059962E-03 -5.057039E-03  2.904566E-02
  29 -4.581722E-02  4.419494E-02  2.063191E-03  4.243199E-03 -6.499605E-03
  30  4.385196E-02 -3.204086E-01 -1.279372E-03  2.454236E-03 -5.470362E-03
  31  9.879408E-03 -2.202778E-02 -1.887970E-01  7.219524E-02 -1.241630E-01
  32  1.443492E-03  5.036378E-03  7.235257E-02 -7.357053E-02  6.847109E-02
  33 -4.034273E-03  8.875759E-03 -1.249539E-01  6.886918E-02 -1.800114E-01
  34  9.205108E-04  2.151887E-03 -3.251011E-03  2.100621E-03 -1.955914E-03
  35  6.472624E-03  2.166082E-03 -4.961588E-03  3.187735E-03  1.801821E-03
  36  2.475965E-03 -5.392252E-03  2.780805E-02 -7.027603E-03 -3.239105E-03
  16
  16  7.058893E-01
  17 -1.554764E-01  2.072989E-01
  18 -5.616458E-02 -9.217960E-02  5.520851E-01
  19  8.981009E-03  3.289528E-03 -2.141664E-02  7.626286E-02
  20 -3.842831E-03  2.156460E-03  1.000977E-02  2.846781E-03  3.642322E-02
  21  9.319906E-03  1.309879E-03 -1.410448E-02 -7.646282E-02 -4.860829E-02
  22 -4.402991E-03  2.531367E-03  3.289913E-03  9.021696E-04 -1.251841E-03
  23  2.725604E-03  6.274671E-03  1.628016E-04 -1.156805E-03 -3.268124E-03
  24  2.537554E-03  5.855487E-04  1.563713E-03 -7.394921E-04 -6.175566E-04
  25  1.026188E-03 -3.997980E-04 -7.437612E-04 -2.657211E-04  1.335569E-04
  26 -2.275656E-04 -6.403373E-04  4.661607E-04 -2.089260E-04  2.334557E-04
  27 -8.333180E-04  4.056932E-04 -1.539848E-03  1.552873E-03 -2.872954E-04
  28 -4.991017E-03  2.960965E-03  2.222760E-03 -2.021511E-05 -3.472167E-04
  29  2.739691E-03  6.103553E-03  3.149083E-04 -3.757001E-04 -1.056348E-03
  30  2.773596E-03  4.162969E-04  2.059696E-03  1.871311E-05 -1.472036E-04
  31  8.389904E-03 -3.707522E-03  7.624896E-03  3.256189E-04 -1.070285E-04
  32  3.342920E-03  8.529872E-04  1.309502E-03  2.234428E-04  9.689788E-05
  33 -2.246703E-02  9.836860E-03 -1.293489E-02 -1.083503E-03  6.089104E-04
  34 -3.062399E-01  5.922308E-02  8.165886E-02  5.042012E-04 -1.375289E-03
  35  5.927026E-02 -5.126865E-02 -1.257008E-02 -1.443801E-03 -3.563906E-03
  36  8.262465E-02 -1.283576E-02 -8.543201E-02  3.426350E-04 -1.115803E-03
  21
  21  3.374350E-01
  22 -2.145991E-04  1.952296E-01
  23 -7.899096E-04 -7.886814E-02  6.658415E-02
  24  2.434416E-04  1.327314E-01 -7.490323E-02  1.874819E-01
  25 -1.504871E-04  1.714452E-04 -1.179780E-03 -2.325636E-04  3.220066E-01
  26  1.568302E-04 -1.102295E-03 -3.573059E-03 -1.099444E-03 -6.529751E-02
  27 -3.836369E-04 -5.860705E-04 -8.976903E-04  1.021072E-03 -8.867881E-02
  28  1.218173E-05  3.024906E-04  2.516389E-04 -1.141836E-03  4.239541E-04
  29 -1.441930E-04 -1.026389E-04  8.840927E-05  6.332547E-04 -1.291296E-03
  30 -1.374519E-04  3.812079E-04  2.389330E-04 -8.747765E-04 -1.502824E-04
  31  3.851771E-04 -1.639111E-04 -1.356532E-04 -3.111450E-04 -7.902321E-04
  32  1.182647E-04 -2.347572E-04 -1.084055E-03 -1.576438E-04  2.115227E-04
  33 -7.593839E-04 -1.667396E-04 -1.008961E-04 -1.123286E-05  5.825404E-04
  34 -2.035376E-04 -8.011347E-04  3.153121E-04  4.611029E-04 -1.316073E-04
  35 -9.809362E-04  5.440663E-04  6.513052E-05 -1.662642E-04 -2.997207E-04
  36  5.655858E-04 -1.044500E-03  2.754315E-04  3.638921E-04  5.800512E-05
  26
  26  4.236930E-02
  27  1.211680E-02  8.569162E-02
  28 -1.245167E-03  5.853340E-04  7.688814E-02
  29 -3.211526E-03 -1.073037E-03  3.035872E-03  3.637004E-02
  30 -9.073261E-04  5.569719E-04 -7.721346E-02 -4.840140E-02  3.370049E-01
  31  5.516888E-04 -1.052577E-03  7.862359E-04 -1.339463E-03 -2.253878E-04
  32  8.382047E-05  2.822024E-04 -1.251478E-03 -3.643525E-03 -9.503755E-04
  33 -1.807402E-04  3.462591E-04 -6.238417E-04 -7.053561E-04  3.763429E-04
  34 -3.067637E-04  5.290655E-05 -2.494180E-04  1.374706E-04 -1.399704E-04
  35 -1.022331E-03 -2.811845E-04 -2.047648E-04  2.343184E-04  1.575352E-04
  36 -2.816129E-04  2.140399E-05  1.520584E-03 -2.731030E-04 -3.847069E-04
  31
  31  1.959794E-01
  32 -7.942207E-02  6.628383E-02
  33  1.329428E-01 -7.483048E-02  1.873301E-01
  34  1.308904E-04 -1.106336E-03 -1.028018E-04  3.222324E-01
  35 -9.751806E-04 -3.283255E-03 -1.006349E-03 -6.565566E-02  4.251950E-02
  36 -6.224509E-04 -9.048575E-04  1.125719E-03 -8.805915E-02  1.207815E-02
  36
  36  8.527950E-02
&

我确实尝试使用pandas来查看文件和提取物在一个单独的文件中,下面是什么& zmat。我的意思是C1到H12行

import os 
import pandas
import numpy

os.chdir ("D:\\Ubuntu\downloads")
dt =pandas.read_fwf ("ga1.01.in", header =None, sep = '\s+')

不幸的是,熊猫只能看到两列,我无法进一步了解 我计划为下面的线和hess做同样的事情。 这是对称矩阵的较低标准。

http://i.stack.imgur.com/u05ZE.png

在这种情况下为36x36,但它被5列保存,而行则被枚举。一旦我能够提取那部分,我仍然在解决方案中工作,我认为大熊猫是这个问题的一个很好的选举。如果有人可以给我一个想法从哪里开始,我将非常感谢你的帮助。

由于

1 个答案:

答案 0 :(得分:0)

尝试使用此行读取C1到H11行的选项,跳过前11行,读取12行并指定每行的固定宽度字段范围。

dt =pandas.read_fwf ("ga1.01.in", header =None, skiprows=11, nrows=12, colspecs=[(0,5), (12,35), (36,61),(62,91)])