从Shell脚本中的表的最后一行提取值

时间:2018-08-08 03:03:21

标签: python linux bash shell perl

我有一个包含以下内容的文件(data.txt)。它中有多行,按-的顺序分隔。它看起来像放在文件中的图形表。在下面的文件中,第一行具有所有列名称,其他所有行都是所有这些列的实际数据。

Connecting to the ControlService endpoint

Found 3 rows.
Requests List:
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Client ID                                                                   | Client Type                  | Service Type | Status               | Trust Domain              | Data Instance Name | Data Version | Creation Time              | Last Update                | Scheduled Time | 
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 REFRESH_ROUTINGTIER_ARTIFACTS_1465901168866                              | ROUTINGTIER_ARTIFACTS | SYSTEM       | COMPLETED            | RRA Bulk Client    | soa_server1       | 18.2.2.0.0  | 2016-06-14 03:49:55 -07:00 | 2016-06-14 03:49:57 -07:00 | ---            | 
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 500333443                                                          | CREATE                        | [FA_GSI]     | COMPLETED            | holder       | soa_server1       | 18.3.2.0.0  | 2018-08-07 11:59:57 -07:00 | 2018-08-07 12:04:37 -07:00 | ---            | 
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 500333446                                                          | CREATE                        | [FA_GSI]     | COMPLETED            | holder-test  | soa_server1       | 18.3.2.0.0  | 2018-08-07 12:04:48 -07:00 | 2018-08-07 12:08:52 -07:00 | ---            | 
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

现在,我想解析上面的文件并从最后一行中提取值。我想在最后一行中提取“客户端ID”和“信任域”列的值:

Client ID: 500333446
Trust Domain: holder-test

这可以在shell脚本,perl或python中完成吗?

4 个答案:

答案 0 :(得分:0)

是的,可以在python中进行。我建议使用csv模块,并将定界符自定义为'|'。

import csv
with open('s', 'r') as f:
  reader = csv.reader(f, delimiter='|')
  for row in reader:
      print(row)

给出以下列表:

['Connecting to the ControlService endpoint']
[]
['Found 3 rows.']
['Requests List:']
['-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------']
[' Client ID                                                                   ', ' Client Type                  ', ' Service Type ', ' Status               ', ' Trust Domain              ', ' Data Instance Name ', ' Data Version ', ' Creation Time              ', ' Last Update                ', ' Scheduled Time ', ' ']
['-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------']
[' REFRESH_ROUTINGTIER_ARTIFACTS_1465901168866                              ', ' ROUTINGTIER_ARTIFACTS ', ' SYSTEM       ', ' COMPLETED            ', ' RRA Bulk Client    ', ' soa_server1       ', ' 18.2.2.0.0  ', ' 2016-06-14 03:49:55 -07:00 ', ' 2016-06-14 03:49:57 -07:00 ', ' ---            ', ' ']
['-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------']
[' 500333443                                                          ', ' CREATE                        ', ' [FA_GSI]     ', ' COMPLETED            ', ' holder       ', ' soa_server1       ', ' 18.3.2.0.0  ', ' 2018-08-07 11:59:57 -07:00 ', ' 2018-08-07 12:04:37 -07:00 ', ' ---            ', ' ']
['-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------']
[' 500333446                                                          ', ' CREATE                        ', ' [FA_GSI]     ', ' COMPLETED            ', ' holder-test  ', ' soa_server1       ', ' 18.3.2.0.0  ', ' 2018-08-07 12:04:48 -07:00 ', ' 2018-08-07 12:08:52 -07:00 ', ' ---            ', ' ']
['-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------']

您可以轻松地跳过结果列表中的前4行。

答案 1 :(得分:0)

@paragbaxi的解决方案很好,我只会添加一个条件以过滤出仅包含“ ------”的行。像这样:

import csv

lines_to_skip = 4
with open('data.csv', 'r') as f:
    reader = csv.reader(f, delimiter='|')
    for i in range(lines_to_skip):
        next(reader) #Skipping lines

    data = []
    for line in reader:
        if line[0].find("---") != 0:  #Check what position has symbol "---" if 0 then skip
            print(line)
            data.append(line)


print("Last row:\n{}".format(data[-1]))
print("Client ID:{} Domain:{}".format(data[-1][0].replace(" ",""),data[-1][4].replace(" ","")))  #replace() just removes unnecessary spaces

输出:

[' Client ID                                                                   ', ' Client Type                  ', ' Service Type ', ' Status               ', ' Trust Domain              ', ' Data Instance Name ', ' Data Version ', ' Creation Time              ', ' Last Update                ', ' Scheduled Time ', ' ']
[' REFRESH_ROUTINGTIER_ARTIFACTS_1465901168866                              ', ' ROUTINGTIER_ARTIFACTS ', ' SYSTEM       ', ' COMPLETED            ', ' RRA Bulk Client    ', ' soa_server1       ', ' 18.2.2.0.0  ', ' 2016-06-14 03:49:55 -07:00 ', ' 2016-06-14 03:49:57 -07:00 ', ' ---            ', ' ']
[' 500333443                                                          ', ' CREATE                        ', ' [FA_GSI]     ', ' COMPLETED            ', ' holder       ', ' soa_server1       ', ' 18.3.2.0.0  ', ' 2018-08-07 11:59:57 -07:00 ', ' 2018-08-07 12:04:37 -07:00 ', ' ---            ', ' ']
[' 500333446                                                          ', ' CREATE                        ', ' [FA_GSI]     ', ' COMPLETED            ', ' holder-test  ', ' soa_server1       ', ' 18.3.2.0.0  ', ' 2018-08-07 12:04:48 -07:00 ', ' 2018-08-07 12:08:52 -07:00 ', ' ---            ', ' ']
Last row:
[' 500333446                                                          ', ' CREATE                        ', ' [FA_GSI]     ', ' COMPLETED            ', ' holder-test  ', ' soa_server1       ', ' 18.3.2.0.0  ', ' 2018-08-07 12:04:48 -07:00 ', ' 2018-08-07 12:08:52 -07:00 ', ' ---            ', ' ']
Client ID:500333446 Domain:holder-test

Process finished with exit code 0  

答案 2 :(得分:0)

>>> import csv
... from itertools import islice
... 
... with open('infile', 'r') as f:
...     reader = csv.DictReader(islice(f, 5, None, 2), delimiter='|')
...     result = [
...         {k.strip(): v.strip() for k, v in line.items()} for line in reader
...     ]
... 
>>> last_row = result[-1]
>>> import json; print(json.dumps(last_row, indent=2))
{
  "Client ID": "500333446",
  "Client Type": "CREATE",
  "Service Type": "[FA_GSI]",
  "Status": "COMPLETED",
  "Trust Domain": "holder-test",
  "Data Instance Name": "soa_server1",
  "Data Version": "18.3.2.0.0",
  "Creation Time": "2018-08-07 12:04:48 -07:00",
  "Last Update": "2018-08-07 12:08:52 -07:00",
  "Scheduled Time": "---",
  "": ""
}
>>> last_row['Client ID']
'500333446'
>>> last_row['Trust Domain']
'holder-test'

答案 3 :(得分:0)

awk中的一个:

awk 'BEGIN{FS="|"}!/^-+/{c=$1;t=$5}END{print "Client ID:" c ORS "Trust Domain:" t}' file

解释:

$ awk '
BEGIN { FS="|" }                                # pipe-separator
!/^-+/ {                                        # process if doesnt start with dashes
    c=$1                                        # client value
    t=$5                                        # trust domain value
}
END {                                           # in the end
    print "Client ID:" c ORS "Trust Domain:" t  # output the last value pair
}' file

输出:

Client ID: 500333446                                                          
Trust Domain: holder-test