从数据集中搜索,打印,递增和输出

时间:2015-06-12 23:13:58

标签: python string file search

我正在努力寻找一种有效的搜索文件的方法,然后在该文件中输出特定数量的行。所以,让我们说我有一个名为" mc_coordinates"我想从中提取信息。它详细说明了给定"步骤"中原子的坐标。该文件的组织方式如下:

         235
     INITIAL BOX 1, STEP 1
        C        12.7908790847        2.8828150218        1.1868087958
        F        11.8993427046        1.8104266120        1.3121312895
        F        13.8944796514        2.3931204072        0.4205213241
        H        12.2211496090        3.7216596131        0.7243429292
        H        13.2020314728        3.1740947812        2.1515988338
        C        12.7759828577        3.6624296172       15.2649736115
        F        11.9718262161        4.3758674409       16.1755975367
        F        12.3319038697        2.3994507343       15.0709447687
        H        12.7017245825        4.2254002044       14.2724601980
        H        13.8483690007        3.6371660190       15.6480138479

澄清:INITIAL BOX(STEP NUMBER)。在每个"步骤",我们记录每个原子的位置。我的问题是我只对48的倍数的步骤感兴趣,因为这些是我有兴趣观察的坐标。所以,这意味着我必须开发以下代码:

1)查找' INITIAL BOX 1'。

2)在此INITAL BOX STEP#行之前的行,在本例中为235,并将其除以5并输出此信息

3)打印每个原子坐标,以' C'直到我在' INITIAL BOX 2'

之前的行中找到唯一的数字

4)然后我需要再次搜索我的文件" INITIAL BOX 49"并且基本上重复步骤2,取数字,除以5,输出此信息,并打印/输出所有坐标,直到我达到" INITIAL BOX 50"之前的单个数字。

这个过程重复约600次。这就是我到目前为止所做的:

 fo = open("mc_coordinates")
 lines = fo.readlines()
 for line in lines:
        print lines.find(INITIAL)

但是这给了我错误,甚至没有开始做我需要的工作。任何提示或帮助将不胜感激!谢谢! 编辑:示例输出

        47     # which is 235/5
        C        12.7908790847        2.8828150218        1.1868087958
        F        11.8993427046        1.8104266120        1.3121312895
        F        13.8944796514        2.3931204072        0.4205213241
        H        12.2211496090        3.7216596131        0.7243429292
        H        13.2020314728        3.1740947812        2.1515988338
        C        12.7759828577        3.6624296172       15.2649736115
        F        11.9718262161        4.3758674409       16.1755975367
        F        12.3319038697        2.3994507343       15.0709447687
        H        12.7017245825        4.2254002044       14.2724601980
        H        13.8483690007        3.6371660190       15.6480138479

@Bharadwa代码产生:

  47.0
  C        12.7908790847        2.8828150218        1.1868087958
  F        11.8993427046        1.8104266120        1.3121312895
  F        13.8944796514        2.3931204072        0.4205213241
  H        12.2211496090        3.7216596131        0.7243429292
  H        13.2020314728        3.1740947812        2.1515988338
  C        12.7759828577        3.6624296172       15.2649736115
  F        11.9718262161        4.3758674409       16.1755975367
  F        12.3319038697        2.3994507343       15.0709447687
  H        12.7017245825        4.2254002044       14.2724601980
  H        13.8483690007        3.6371660190       15.6480138479
  BOX 1,  STEP 24
  C        12.4921773110        2.8286307659        1.1644437594
  F        11.6006113644        1.7562423561        1.2895557390
  F        13.5960941031        2.3384787776        0.3989045637
  H        11.9164523192        3.6676509395        0.7097873135
  H        12.9031018802        3.1199105253        2.1293308523
  C        12.4895445934        3.5818345553       14.9048344490
  F        11.6838031267        4.2949649016       15.8142975674
  F        12.0450334972        2.3193713672       14.7084532328
  H        12.4010184199        4.1124983940       13.8958289702
  H        13.5698640919        3.5405566993       15.2634094502

我需要:

  47.0
  C        12.7908790847        2.8828150218        1.1868087958
  F        11.8993427046        1.8104266120        1.3121312895
  F        13.8944796514        2.3931204072        0.4205213241
  H        12.2211496090        3.7216596131        0.7243429292
  H        13.2020314728        3.1740947812        2.1515988338
  C        12.7759828577        3.6624296172       15.2649736115
  F        11.9718262161        4.3758674409       16.1755975367
  F        12.3319038697        2.3994507343       15.0709447687
  H        12.7017245825        4.2254002044       14.2724601980
  H        13.8483690007        3.6371660190       15.6480138479
  48.0
  BOX 1,  STEP 48
  C        12.6660795715        3.6355249989       15.1210670811
  F        11.9116309909        4.3486553452       16.0735112076
  F        12.2114308249        2.3730618108       14.9494564347
  H        12.5221365040        4.1661888376       14.1184657979
  H        13.7645020294        3.5942471429       15.4196208523

1 个答案:

答案 0 :(得分:0)

根据我的理解代码片段,有2个代码
1.将打印所有原子 2.打印那些STEP变量倍数的原子 我想代码2对你的工作更有用

#[code1]

with open("mc_coordinates") as file:
    [print(str(float(line.strip())/5)) if len(line.split(' '))==1 else print(line.strip())
    for line in file if not line.strip().startswith('INITIAL')]
#[code2]

STEP = 48
Arr = []
newlines = []

with open("mc_coordinates") as file:
    for line in file:
        line = line.strip()
        if line.startswith('INITIAL'):
            continue
        if len(line.split(' '))==1:
            if Arr:
                newlines.append(Arr)
            Arr = []
            Arr.append(str(float(line)/5))
        else:
            Arr.append(line)

[print(j) for i, line in enumerate(newlines) for j in line if not i%STEP]

或用以下代码替换最后一行

for i, line in enumerate(newlines):
    for j in line:
        if not i%STEP:
            print(j)