python条件文本搜索

时间:2018-02-12 17:53:47

标签: python regex string parsing text

我是python编程的新手,并且一直在使用它来尝试改进我工作中的一些繁琐的任务。一个这样的任务是采用由软件输出的特定格式的报告,并将其转换为另一软件可读取的格式以供进一步处理。

到目前为止,通过使用我在此处研究堆栈溢出和其他各种资源所能找到的内容,我做得相当不错。但我已经遇到了一个我没有太多运气破解并希望得到一些建议或指向正确方向的那个。

我的原始数据是这样的:

BR6.FLD T: Tue Nov 07 15:22:25 2017


// 
   // Generated by 12dField - Setout
   // 11.0C1m
   // Surveyor: gm
   Coordinate:  Name: CH9583R TT X: 414638.4070 Y: 827823.6220 Z: 88.0290
   Station:  Name: CH9583R TT Ht: 1.4240
   Target Height:  0.4000
   Target Height:  0.4000
   PPM Correction:  O: 0.00000000
   Measurement:  H:   20° 24' 28" V:   92° 44'  9" S: 115.9559
   Attribute Set: Attribute Set Start:  N:12D Field
   Attribute Set: Attribute Set Start:  N:Basic Pickup
   Attribute: Real Attribute for Vertex:  N:so_cs_raw_3d_ch V:0.0000000000000000
   Attribute Set: Attribute Set End:  N:Basic Pickup
   Attribute Set: Attribute Set Start:  N:Product Details
   Attribute: Integer Attribute for Vertex:  N:12d_product_version V:11
   Attribute: Integer Attribute for Vertex:  N:12d_major_version V:1
   Attribute: Integer Attribute for Vertex:  N:12d_minor_version V:13
   Attribute: Integer Attribute for Vertex:  N:12d_build_version V:6
   Attribute: Integer Attribute for Vertex:  N:version V:23
   Attribute Set: Attribute Set End:  N:Product Details
   Attribute Set: Attribute Set Start:  N:Inst Stat Setup
   Attribute: Real Attribute for Vertex:  N:is_x V:414638.4070000000100000
   Attribute: Real Attribute for Vertex:  N:is_y V:827823.6219999999700000
   Attribute: Real Attribute for Vertex:  N:is_z V:88.0289999999999960
   Attribute: Real Attribute for Vertex:  N:is_hi V:1.4239999999999999
   Attribute: Real Attribute for Vertex:  N:is_bearing_swing V:2.1483160800061616

......根据现场观察的数量,持续很长时间。

通过一系列列表推导,我已经通过这个来输出一个更友好的文件,如下所示:

Station:


CH9583R TT Ht: 1.4240

Measurement:
  H:   20-24-28 V:   92-44- 9 S: 115.9559
   Prism Constant:0.0175000000000000
   Target height:0.4000000000000000
   Name:CP1

Measurement:
  H:   17-49-10 V:   91- 8-14 S: 172.6005
   Prism Constant:0.0175000000000000
   Target height:0.4000000000000000
   Name:CP1

Measurement:
  H:   48-48-29 V:   91-10-11 S: 167.7516
   Prism Constant:0.0175000000000000
   Target height:0.4000000000000000
   Name:CP3

下一步是我想将其转换为json对象,以便我可以访问某些代码中的属性来输出最终形式。

目前我能输出的是:

{
"Stations":[

{ "Station":" CH9583R TT " , "Ht": 1.4240

,"Measurements": [ {
 "H":  "20-24-28"  ,"V":  "92-44-09"   ,"S":" 115.9559" 
  ,"Prism_Constant":"0.0175000000000000" 
  ,"Target_Height":"0.4000000000000000" 
  ,"Name":"CP1"} 

{
 "H":  "17-49-10"  ,"V":  "91-08-14"   ,"S":" 172.6005" 
  ,"Prism_Constant":"0.0175000000000000" 
  ,"Target_Height":"0.4000000000000000" 
  ,"Name":"CP1"} 

{
 "H":  "48-48-29"  ,"V":  "91-10-11"   ,"S":" 167.7516" 
  ,"Prism_Constant":"0.0175000000000000" 
  ,"Target_Height":"0.4000000000000000" 
  ,"Name":"CP3"} 

{ "Station":" CH9504L TT " , "Ht": 1.4110

,"Measurements": [ {
 "H":  "307-01-10"  ,"V":  "90-02-25"   ,"S":" 120.6765" 
  ,"Prism_Constant":"0.0175000000000000" 
  ,"Target_Height":"0.4000000000000000" 
  ,"Name":"CP1A"} 

{

以json的形式阅读这个不太正确。我的主要问题是我不确定如何解决为插入点搜索字符串的问题。我想说的是:

if a_sequence_of_characters is_followed_by(another_sequence):
    insert(',',location)

并使用它来完成格式化数据。

对不起帖子的长度。欢迎提出任何建议,并提前感谢您的帮助。

1 个答案:

答案 0 :(得分:0)

可能对您的数据有足够的了解,以避免对.fld文件如何转换为.dat文件的一些猜测。如果我对此有任何接近,我想知道直接从前者到后者是否更容易。

这是我到目前为止所拥有的。

import re

coords_line = re.compile(
    r'Coordinate:\s+Name:\s+(?P<name>[a-z0-9]+)[^X]+X:\s+(?P<X>[0-9.]+)[^Y]+Y:\s+(?P<Y>[0-9.]+)[^Z]+Z:\s+(?P<Z>[0-9.]+)', re.I)
measurement_line = re.compile(
    r'''Measurement:\s+H:\s+(?P<h_degrees>[0-9]+).\s+(?P<h_minutes>[0-9]+)'\s+(?P<h_seconds>[0-9]+)"\s+V:\s+(?P<v_degrees>[0-9]+).\s+(?P<v_minutes>[0-9]+)'\s+(?P<v_seconds>[0-9]+)"\s+S:\s+(?P<S>[0-9.]+)''')
attributes_line = re.compile(
    r'''Attribute: Text Attribute for Vertex:\s+N:store_pt_string_name\s+V:(?P<attribute>[a-z0-9]+)''', re.I)

with open('greg_out.txt', 'w') as greg_out:
    coords_info = []
    with open('greg_in.txt') as greg:
        for line in greg:
            m = coords_line.search(line)
            if m:
                if not m.group('name') in coords_info:
                    coords_info.append(m.group('name'))
                    print ('C', m.group('name'), m.group('X'), m.group('Y'), m.group('Z'), file=greg_out)
        print (file=greg_out)

    current_coords = None
    with open('greg_in.txt') as greg:
        for line in greg:
            m = coords_line.search(line)
            if m:
                if current_coords :
                    print ('DE\n', file=greg_out)
                current_coords = m.group('name')
                print ('DB', m.group('name'), file=greg_out)
            m = measurement_line.search(line)
            if m:
                recent_horizontal = (m.group('h_degrees'), m.group('h_minutes'), m.group('h_seconds'), m.group('S'), m.group('v_degrees'), m.group('v_minutes'), m.group('v_seconds'))
            m = attributes_line.search (line)
            if m:
                attribute = m.group('attribute')
                if attribute[0] in '0123456789':
                    attribute = 'CH' + attribute
                print ('DM', attribute, '{}-{}-{} {} {}-{}-{}'.format(*recent_horizontal), file=greg_out)

        print ('DE\n', file=greg_out)

这就是它产生的东西。

C CH9583R 414638.4070 827823.6220 88.0290
C CH9504L 414775.1470 827859.5190 82.5870
C CH9360R 414672.4040 828056.2440 87.2310
C CP2 414691.2159 827987.9097 85.6298

DB CH9583R
DM CP1 20-24-28 115.9559 92-44-9
DM CP1 17-49-10 172.6005 91-8-14
DM CP3 48-48-29 167.7516 91-10-11
DE

DB CH9504L
DM CP1A 307-1-10 120.6765 90-2-25
DM CP2A 326-49-38 153.4059 89-7-51
DM CP3A 351-57-33 75.3264 88-27-17
DM BS2 255-17-27 141.4947 87-47-33
DM CP1B 307-1-13 120.6767 90-2-26
DM BS2B 255-17-27 141.4771 87-47-36
DM CP2B 326-49-43 153.4090 89-7-52
DM CP3B 351-57-34 75.3262 88-27-17
DM BS2 255-17-27 141.4769 87-47-34
DM CP1 307-1-15 120.6772 90-2-26
DM CP2 326-49-43 153.4065 89-7-50
DM CP3 351-57-35 75.3266 88-27-17
DM BS2 255-17-26 141.4769 87-47-35
DE

DB CH9583R
DM CP1 20-24-31 115.9544 92-44-5
DM BS 75-17-25 141.4892 92-9-14
DM CP2 17-49-11 172.5993 91-8-12
DM CP3 48-48-30 167.7499 91-10-7
DM BS1 75-17-25 141.4715 92-9-14
DM BS1 75-17-25 141.4716 92-9-13
DM CP1 20-24-30 115.9553 92-44-8
DM CP2 17-49-11 172.6006 91-8-11
DM CP3 48-48-31 167.7485 91-10-8
DM BS1 75-17-27 141.4711 92-9-13
DM CP1 20-24-32 115.9559 92-44-5
DM CP2 17-49-10 172.6002 91-8-13
DM CP3 48-48-34 167.7505 91-10-8
DM BS1 75-17-29 141.4711 92-9-13
DE

DB CH9360R
DM CP1 177-2-58 124.3272 92-14-30
DM BS2 188-18-50 235.0917 89-51-44
DM CP2 164-36-28 70.9176 91-58-36
DM BS2 188-18-44 235.0915 89-51-39
DM CP1 177-2-54 124.3264 92-14-30
DM CP2 164-36-23 70.9163 91-58-33
DM BS2 188-18-42 235.0917 89-51-40
DM CP1 177-2-54 124.3240 92-14-30
DM CP2 164-36-31 70.9200 91-58-38
DE

DB CP2
DM CH9360 344-36-29 70.8914 88-45-37
DM CH9583R 197-49-11 172.5688 89-36-21
DM CP1 192-33-48 57.1951 93-19-41
DM CH9504L 146-49-35 153.4204 91-9-37
DM CP3 126-15-34 91.0306 90-45-37
DM CH9360R 344-36-27 70.8914 88-45-37
DM CH9583R 197-49-10 172.5682 89-36-23
DM CH9504L 146-49-37 153.4203 91-9-38
DM C3 126-15-31 91.0292 90-45-37
DM CH9360R 344-36-27 70.8913 88-45-39
DM CH9583R 197-49-4 172.5685 89-36-22
DM CP1 192-33-45 57.1953 93-19-40
DM CH9504L 146-49-30 153.4206 91-9-37
DM CP3 126-15-26 91.0301 90-45-38
DE

有些项目显然是缺失的,大多数情况下是因为我不知道它们是如何计算的,也不想为此付出努力。