我有以下类型的输入数据(对于Splunk)
svr28pr,Linux File System-ALL,success,32.87,2638.259,26/06/14 19:00,26/06/14 21:03,avamar xxxxx1.network.local,Activity completed with exceptions.,26/06/14 19:00
SVr44PR:Staging_SyncDB,incr,success,1271,1271,27/06/14 11:28,27/06/14 11:28,SQL,,,1/01/70 09:59
我需要将其分解为字段 - 以下表达式运行良好。
(?<client>[^,]+),(?<backuptype>[^,]+),(?<status>[^,]+),(?<size>[^,]+),(?<dump>[^,]+),(?<start>[^,]+),(?<complete>[^,]+),(?<application>[^,]+),(?<server>[^,]+),(?<comment>[^,]+)
但是,由于BackupTypes名称的更改,第二个字段可能会被引用并包含逗号,例如
svr08ts,"Windows VSS-ALL,ALL",success,0.067,39.627,26/06/14 21:32,26/06/14 21:38,avamar,xxxxxxx2.network.local,Activity completed with exceptions.,26/06/14 20:00
有没有办法使用正则表达式来确定某个字段是否使用引号,如果是这样,可以将引号之间的数据复制到命名组中?
答案 0 :(得分:0)
正如@ thimoty-shields所说,使用csv模块
import csv
csvfile='backups.csv'
with open(csvfile) as csvfile:
backups = csv.reader(csvfile)
for row in backups:
for cell in row:
#do what you need
print cell
答案 1 :(得分:0)
您不需要使用正则表达式来处理CSV文件,而是使用csv模块。默认情况下,这将处理带引号的字段。您可以使用csv.DictReader
生成一系列字典,类似于您在重新匹配对象上groupdict()
返回的字典。
如果您的输入文件包含:
svr28pr,Linux File System-ALL,success,32.87,2638.259,26/06/14 19:00,26/06/14 21:03,avamar,xxxxx1.network.local,Activity completed with exceptions.,26/06/14 19:00
SVr44PR:Staging_SyncDB,incr,success,1271,1271,27/06/14 11:28,27/06/14 11:28,SQL,,,1/01/70 09:59
svr08ts,"Windows VSS-ALL,ALL",success,0.067,39.627,26/06/14 21:32,26/06/14 21:38,avamar,xxxxx1.network.local,Activity completed with exceptions.,26/06/14 20:00
此脚本
import csv
from pprint import pprint
fields = 'client backuptype status size dump start complete application server comment'.split()
with open('input.csv') as f:
reader = csv.DictReader(f)
reader.fieldnames = fields
for row_dict in reader:
pprint(row_dict) # process the row here
会输出:
{None: ['26/06/14 19:00'],
'application': 'avamar',
'backuptype': 'Linux File System-ALL',
'client': 'svssi0000028pr',
'comment': 'Activity completed with exceptions.',
'complete': '26/06/14 21:03',
'dump': '2638.259',
'server': 'xxxxx1.network.local',
'size': '32.87',
'start': '26/06/14 19:00',
'status': 'success'}
{None: ['1/01/70 09:59'],
'application': 'SQL',
'backuptype': 'incr',
'client': 'SVr44PR:Staging_SyncDB',
'comment': '',
'complete': '27/06/14 11:28',
'dump': '1271',
'server': '',
'size': '1271',
'start': '27/06/14 11:28',
'status': 'success'}
{None: ['26/06/14 20:00'],
'application': 'avamar',
'backuptype': 'Windows VSS-ALL,ALL',
'client': 'svctx0000008ts',
'comment': 'Activity completed with exceptions.',
'complete': '26/06/14 21:38',
'dump': '39.627',
'server': 'xxxxxxx2.network.local',
'size': '0.067',
'start': '26/06/14 21:32',
'status': 'success'}
具体而言,
>>> print row_dict['backuptype']
Windows VSS-ALL,ALL
根据需要。
答案 2 :(得分:0)
您可以在process tabular data的splunk中使用multikv
。