为行中的每一列写入值

时间:2015-09-12 13:59:15

标签: python-2.7

我有文本文件,例如:

blahhh blaahhh blahhh
 some thing write this long 23.78, lat 45.45
      g.m.  occ/yr  r(event)   g.m. occ/yr   r(event)
      0.125  0.254   12.587    0.258 2.568   1.369
      0.785  0.365   10.258    0.897 2.987   9.365
something note write here blahh blahhh blahhh

我想要一个如下所示的字符串:

long 23.78 lat 45.45 g.m. 0.125, 0.785 occ/yr 0.254, 0.365 r(event) 12.587,10.258 g.m 0.258, 0.897 occ/yr 2.568, 2.987 r(event) 1.369, 9.365 

这是我的代码:

file = open('geotechnic.txt').readlines()
i =0
while i < len(file):
    for line in file:
        wordList = re.sub("[^\w\./()]", " ",  line).split()
        try:
            print wordList[i]
        except:
            pass
i+=1

1 个答案:

答案 0 :(得分:0)

以下内容必须根据您的使用情况进行调整:

<强> parsegeo.py

import re

data = '''blahhh blaahhh blahhh
 some thing write this long 23.78, lat 45.45
      g.m.  occ/yr  r(event)   g.m. occ/yr   r(event)
      0.125  0.254   12.587    0.258 2.568   1.369
      0.785  0.365   10.258    0.897 2.987   9.365
something note write here blahh blahhh blahhh'''

lines = data.split('\n')
matchobj = re.match('^.*(long \d+\.\d+),\s+(lat \d+\.\d+)', lines[1])
longval = matchobj.group(1)
latval = matchobj.group(2)

headers = lines[2].strip().split()
dataline1 = lines[3].strip().split()
dataline2 = lines[4].strip().split()

zippeddata = zip(dataline1, dataline2)

outputlist = [longval, latval]
for i in range(0, len(headers)):
    segment = '{header} {valtuple}'.format(header=headers[i], valtuple=', '.join(zippeddata[i]))
    outputlist.append(segment)

print " ".join(outputlist)

<强>输出:

(parsegeo)macbook:parsegeo user$ python parsegeo.py
long 23.78 lat 45.45 g.m. 0.125, 0.785 occ/yr 0.254, 0.365 r(event) 12.587, 10.258 g.m. 0.258, 0.897 occ/yr 2.568, 2.987 r(event) 1.369, 9.365

发生了什么:

您必须对此进行调整才能使用readlines,因为我只使用长字符串作为data来源。我在换行符上拆分数据源以获取单独的行并将它们分配给line列表。

我跳过第一行。在第二行,我使用带捕获组的正则表达式来捕获文本long,然后浮动到第一个捕获组(由括号表示),以及捕获lat后面跟着它浮入第二个捕获组。可以通过matchobj变量访问这些捕获组。

在接下来的3行中,我使用strip删除无关的空格,并使用split标记剩余数据(在默认空格上拆分)并将标记分配给列表。

接下来,我zip两个数据列表一起列出,形成一个2元组的列表。

我迭代标题列表中的元素数量,并将列表outputlist附加到包含列header的一行数据,然后是该列连接在一起的2个数据线值用逗号和空格。

循环完成后,我使用空格加入outputlist列表并打印出来。

编辑:解析在评论中链接的数据文件的解决方案。*

我在下面提供了解析您在评论中链接的数据文件的解决方案。您没有指定要解析的数据块(zero attenuation variability数据或variability in atten数据)。所以我只显示zero attenuation variability数据。 variability in atten数据已标记化并添加到var_atten_data列表中。如果您想显示variability in atten数据,则必须自己列出zip()join()和字符串格式。我会把它作为练习留给你。

更新了parsegeo.py

import re

with open('geotechnic.txt', 'r') as f:
    in_attenuation_block = skipped_first = skipped_second = parsed_header = False
    longval = latval = None
    zero_atten_headers = []
    var_atten_headers = []
    zero_atten_data = []
    var_atten_data = []
    for line in f:
        matchobj = re.match('^.*site at long\s+(\d+\.\d+),\s+lat\s+(\d+\.\d+)', line)
        if matchobj:
            longval = matchobj.group(1)
            latval = matchobj.group(2)
            in_attenuation_block = True
            continue
        if in_attenuation_block:
            if skipped_first:
                if skipped_second:
                    data_line = line.strip().split()
                    if len(data_line) > 5:
                        if 'g.m.' in data_line[0] and len(data_line) > 5:
                            zero_atten_headers = data_line[0:5]
                            var_atten_headers = data_line[5:]
                        elif re.match('^\d+\.\d+\s+\d+\.\d', line.strip()):
                            zero_atten_data.append(data_line[0:5])
                            var_atten_data.append(data_line[5:])
                        elif re.match('^total yearly events', line.strip()):
                            # Reached the end of data block, print out summary
                            zippeddata = zip(*zero_atten_data)
                            outputlist = ["long", longval, "lat", latval]
                            for i in range(0, len(zero_atten_headers)):
                                segment = '{header} {valtuple}'.format(header=zero_atten_headers[i], valtuple=', '.join(zippeddata[i]))
                                outputlist.append(segment)
                            print " ".join(outputlist)
                            # Reset all of the flags, arrays, and vars for the next block of data
                            in_attenuation_block = skipped_first = skipped_second = parsed_header = False
                            longval = latval = None
                            zero_atten_headers = []
                            var_atten_headers = []
                            zero_atten_data = []
                            var_atten_data = []
                            continue
                        else:
                            print 'Unable to parse current line. Skipping to next line.  Current line: {}'.format(line)
                    else:
                        print 'Unable to parse current line. Skipping to next line.  Current line: {}'.format(line)
                else:
                    skipped_second = True
            else:
                skipped_first = True

截断输出(5行):

(parsegeo)macbook:parsegeo user$ python parsegeo.py
long 46.766 lat 32.305 g.m. 0.02, 0.04, 0.06, 0.08, 0.10, 0.12, 0.14, 0.16, 0.18, 0.20, 0.22, 0.24 occ/yr 0.15773, 0.00734, 0.00084, 0.00030, 0.00011, 0.00004, 0.00002, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000 exc/yr 0.00865, 0.00132, 0.00047, 0.00017, 0.00006, 0.00002, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000 r(events) 19.2, 126.4, 352.8, 974.5, 2574.4, 8231.0, 70366.1, 99999.9, 99999.9, 99999.9, 99999.9, 99999.9 r(yrs) 115.6, 759.7, 2120.4, 5856.8, 15472.2, 49469.3, 99999.9, 99999.9, 99999.9, 99999.9, 99999.9, 99999.9
long 46.884 lat 32.306 g.m. 0.02, 0.04, 0.06, 0.08, 0.10, 0.12, 0.14, 0.16, 0.18, 0.20, 0.22, 0.24, 0.26, 0.28, 0.30 occ/yr 0.15085, 0.01156, 0.00285, 0.00070, 0.00023, 0.00010, 0.00005, 0.00002, 0.00001, 0.00001, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000 exc/yr 0.01553, 0.00397, 0.00112, 0.00042, 0.00019, 0.00009, 0.00004, 0.00002, 0.00001, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000 r(events) 10.7, 41.9, 148.2, 394.3, 879.0, 1798.1, 4235.4, 8361.3, 25064.4, 99999.9, 99999.9, 99999.9, 99999.9, 99999.9, 99999.9 r(yrs) 64.4, 251.6, 890.6, 2369.5, 5283.2, 10806.6, 25455.0, 50252.4, 99999.9, 99999.9, 99999.9, 99999.9, 99999.9, 99999.9, 99999.9
long 46.765 lat 32.405 g.m. 0.02, 0.04, 0.06, 0.08, 0.10, 0.12, 0.14, 0.16, 0.18, 0.20, 0.22, 0.24, 0.26 occ/yr 0.15628, 0.00842, 0.00111, 0.00036, 0.00012, 0.00006, 0.00002, 0.00001, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000 exc/yr 0.01010, 0.00168, 0.00057, 0.00021, 0.00009, 0.00003, 0.00001, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000 r(events) 16.5, 98.8, 292.0, 800.9, 1930.1, 5871.5, 19010.9, 99999.9, 99999.9, 99999.9, 99999.9, 99999.9, 99999.9 r(yrs) 99.0, 593.8, 1755.0, 4813.5, 11599.9, 35288.4, 99999.9, 99999.9, 99999.9, 99999.9, 99999.9, 99999.9, 99999.9
long 46.883 lat 32.406 g.m. 0.02, 0.04, 0.06, 0.08, 0.10, 0.12, 0.14, 0.16, 0.18, 0.20, 0.22, 0.24, 0.26, 0.28, 0.30, 0.32, 0.34 occ/yr 0.14909, 0.01221, 0.00351, 0.00101, 0.00032, 0.00013, 0.00006, 0.00003, 0.00002, 0.00001, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000 exc/yr 0.01730, 0.00509, 0.00158, 0.00058, 0.00026, 0.00012, 0.00006, 0.00003, 0.00001, 0.00001, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000 r(events) 9.6, 32.7, 105.0, 287.4, 646.3, 1349.7, 2697.5, 5679.3, 11947.6, 31177.0, 99999.9, 99999.9, 99999.9, 99999.9, 99999.9, 99999.9, 99999.9 r(yrs) 57.8, 196.4, 631.2, 1727.5, 3884.1, 8111.6, 16212.1, 34133.4, 71806.2, 99999.9, 99999.9, 99999.9, 99999.9, 99999.9, 99999.9, 99999.9, 99999.9
long 47.700 lat 33.300 g.m. 0.02, 0.04, 0.06, 0.08, 0.10, 0.12, 0.14, 0.16, 0.18, 0.20, 0.22 occ/yr 0.15767, 0.00717, 0.00095, 0.00046, 0.00011, 0.00003, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000 exc/yr 0.00872, 0.00155, 0.00060, 0.00015, 0.00003, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000 r(events) 19.1, 107.4, 275.1, 1143.4, 5364.2, 99999.9, 99999.9, 99999.9, 99999.9, 99999.9, 99999.9 r(yrs) 114.7, 645.2, 1653.4, 6872.1, 32239.4, 99999.9, 99999.9, 99999.9, 99999.9, 99999.9, 99999.9
...