我有一个包含以下内容的文件(data.txt)。它中有多行,按-
的顺序分隔。它看起来像放在文件中的图形表。在下面的文件中,第一行具有所有列名称,其他所有行都是所有这些列的实际数据。
Connecting to the ControlService endpoint
Found 3 rows.
Requests List:
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Client ID | Client Type | Service Type | Status | Trust Domain | Data Instance Name | Data Version | Creation Time | Last Update | Scheduled Time |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
REFRESH_ROUTINGTIER_ARTIFACTS_1465901168866 | ROUTINGTIER_ARTIFACTS | SYSTEM | COMPLETED | RRA Bulk Client | soa_server1 | 18.2.2.0.0 | 2016-06-14 03:49:55 -07:00 | 2016-06-14 03:49:57 -07:00 | --- |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
500333443 | CREATE | [FA_GSI] | COMPLETED | holder | soa_server1 | 18.3.2.0.0 | 2018-08-07 11:59:57 -07:00 | 2018-08-07 12:04:37 -07:00 | --- |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
500333446 | CREATE | [FA_GSI] | COMPLETED | holder-test | soa_server1 | 18.3.2.0.0 | 2018-08-07 12:04:48 -07:00 | 2018-08-07 12:08:52 -07:00 | --- |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
现在,我想解析上面的文件并从最后一行中提取值。我想在最后一行中提取“客户端ID”和“信任域”列的值:
Client ID: 500333446
Trust Domain: holder-test
这可以在shell脚本,perl或python中完成吗?
答案 0 :(得分:0)
是的,可以在python中进行。我建议使用csv模块,并将定界符自定义为'|'。
import csv
with open('s', 'r') as f:
reader = csv.reader(f, delimiter='|')
for row in reader:
print(row)
给出以下列表:
['Connecting to the ControlService endpoint']
[]
['Found 3 rows.']
['Requests List:']
['-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------']
[' Client ID ', ' Client Type ', ' Service Type ', ' Status ', ' Trust Domain ', ' Data Instance Name ', ' Data Version ', ' Creation Time ', ' Last Update ', ' Scheduled Time ', ' ']
['-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------']
[' REFRESH_ROUTINGTIER_ARTIFACTS_1465901168866 ', ' ROUTINGTIER_ARTIFACTS ', ' SYSTEM ', ' COMPLETED ', ' RRA Bulk Client ', ' soa_server1 ', ' 18.2.2.0.0 ', ' 2016-06-14 03:49:55 -07:00 ', ' 2016-06-14 03:49:57 -07:00 ', ' --- ', ' ']
['-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------']
[' 500333443 ', ' CREATE ', ' [FA_GSI] ', ' COMPLETED ', ' holder ', ' soa_server1 ', ' 18.3.2.0.0 ', ' 2018-08-07 11:59:57 -07:00 ', ' 2018-08-07 12:04:37 -07:00 ', ' --- ', ' ']
['-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------']
[' 500333446 ', ' CREATE ', ' [FA_GSI] ', ' COMPLETED ', ' holder-test ', ' soa_server1 ', ' 18.3.2.0.0 ', ' 2018-08-07 12:04:48 -07:00 ', ' 2018-08-07 12:08:52 -07:00 ', ' --- ', ' ']
['-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------']
您可以轻松地跳过结果列表中的前4行。
答案 1 :(得分:0)
@paragbaxi的解决方案很好,我只会添加一个条件以过滤出仅包含“ ------”的行。像这样:
import csv
lines_to_skip = 4
with open('data.csv', 'r') as f:
reader = csv.reader(f, delimiter='|')
for i in range(lines_to_skip):
next(reader) #Skipping lines
data = []
for line in reader:
if line[0].find("---") != 0: #Check what position has symbol "---" if 0 then skip
print(line)
data.append(line)
print("Last row:\n{}".format(data[-1]))
print("Client ID:{} Domain:{}".format(data[-1][0].replace(" ",""),data[-1][4].replace(" ",""))) #replace() just removes unnecessary spaces
输出:
[' Client ID ', ' Client Type ', ' Service Type ', ' Status ', ' Trust Domain ', ' Data Instance Name ', ' Data Version ', ' Creation Time ', ' Last Update ', ' Scheduled Time ', ' ']
[' REFRESH_ROUTINGTIER_ARTIFACTS_1465901168866 ', ' ROUTINGTIER_ARTIFACTS ', ' SYSTEM ', ' COMPLETED ', ' RRA Bulk Client ', ' soa_server1 ', ' 18.2.2.0.0 ', ' 2016-06-14 03:49:55 -07:00 ', ' 2016-06-14 03:49:57 -07:00 ', ' --- ', ' ']
[' 500333443 ', ' CREATE ', ' [FA_GSI] ', ' COMPLETED ', ' holder ', ' soa_server1 ', ' 18.3.2.0.0 ', ' 2018-08-07 11:59:57 -07:00 ', ' 2018-08-07 12:04:37 -07:00 ', ' --- ', ' ']
[' 500333446 ', ' CREATE ', ' [FA_GSI] ', ' COMPLETED ', ' holder-test ', ' soa_server1 ', ' 18.3.2.0.0 ', ' 2018-08-07 12:04:48 -07:00 ', ' 2018-08-07 12:08:52 -07:00 ', ' --- ', ' ']
Last row:
[' 500333446 ', ' CREATE ', ' [FA_GSI] ', ' COMPLETED ', ' holder-test ', ' soa_server1 ', ' 18.3.2.0.0 ', ' 2018-08-07 12:04:48 -07:00 ', ' 2018-08-07 12:08:52 -07:00 ', ' --- ', ' ']
Client ID:500333446 Domain:holder-test
Process finished with exit code 0
答案 2 :(得分:0)
>>> import csv
... from itertools import islice
...
... with open('infile', 'r') as f:
... reader = csv.DictReader(islice(f, 5, None, 2), delimiter='|')
... result = [
... {k.strip(): v.strip() for k, v in line.items()} for line in reader
... ]
...
>>> last_row = result[-1]
>>> import json; print(json.dumps(last_row, indent=2))
{
"Client ID": "500333446",
"Client Type": "CREATE",
"Service Type": "[FA_GSI]",
"Status": "COMPLETED",
"Trust Domain": "holder-test",
"Data Instance Name": "soa_server1",
"Data Version": "18.3.2.0.0",
"Creation Time": "2018-08-07 12:04:48 -07:00",
"Last Update": "2018-08-07 12:08:52 -07:00",
"Scheduled Time": "---",
"": ""
}
>>> last_row['Client ID']
'500333446'
>>> last_row['Trust Domain']
'holder-test'
答案 3 :(得分:0)
awk中的一个:
awk 'BEGIN{FS="|"}!/^-+/{c=$1;t=$5}END{print "Client ID:" c ORS "Trust Domain:" t}' file
解释:
$ awk '
BEGIN { FS="|" } # pipe-separator
!/^-+/ { # process if doesnt start with dashes
c=$1 # client value
t=$5 # trust domain value
}
END { # in the end
print "Client ID:" c ORS "Trust Domain:" t # output the last value pair
}' file
输出:
Client ID: 500333446
Trust Domain: holder-test