python拆分空格问题

时间:2013-12-04 03:05:53

标签: python spaces

我正在尝试处理

中的Linux输出

以下是我在Linux上的输出:

machine01:/mnt/vlm/log-prod                     machine02:/mnt/machine01_vlm/log-prod                                                    Transferred    17:46:14   Idle
machine01:/mnt/vlm/log-test                     machine02:/mnt/machine01_vlm/log-test                                        Transferred    17:46:14   Idle
machine01:/mnt/wndchl/-                         machine02:/mnt/machine01_wndchl/machine01_wndchl_machine01_wndchl              Transferred    18:36:10   Idle
machine01:/mnt/wndchl/prod                      machine02:/mnt/machine01_wndchl/prod                                         Transferred    18:36:10   Idle
machine01:/mnt/wndchl/test                      machine02:/mnt/machine01_wndchl/test                                         Transferred    18:36:10   Idle
machine01:/mnt/iso/Archive                      machine02:/mnt/iso/Archive                                                  Transferred    19:06:10   Idle
machine01:/mnt/iso/Ready To Transfer            machine02:/mnt/iso/ReadyxToxTransfer                                        Transferred    19:06:10   Idle
machine01:/mnt/iso/-                            machine02:/mnt/iso/iso_machine01_iso                                         Transferred    19:06:10   Idle
machine01:/mnt/it/SCCM                           machine02:/mnt/it/SCCM                                                      Transferred    19:25:51   Idle
machine01:/mnt/it/Windows                        machine02:/mnt/it/Windows                                                   Transferred    19:25:51   Idle
machine01:/mnt/it/-                              machine02:/mnt/it/machine01_it_machine01_it                                   Transferred    19:25:51   Idle
machine01:/mnt/it/dcs                           machine02:/mnt/it/dcs                                                       Transferred    19:25:51   Idle
machine01:/mnt/it/hds_perf_logs                  machine02:/mnt/it/hds_perf_logs                                             Transferred    19:25:51   Idle
machine01:/mnt/legalhold/LegalHold              machine02:/mnt/legalhold/LegalHold                                          Transferred    18:46:06   Idle
machine01:/mnt/legalhold/-                      machine02:/mnt/legalhold/legalhold_machine01_legalhold                       Transferred    18:46:06   Idle

这是我的python脚本

for x in f.readlines():
output_data = x.split()
#Define variable
source_path = output_data[0]
dest_path = output_data[1]
print "working on....",source_path
relationship = output_data[2]
#We are only interested with hour,split it out!
buffer_time = output_data[3].split(":",1)
relationship_status = output_data[4]
#Get destination nas hostname
dest_nas = output_data[1].split(":",1)
dest_nas_hostname = dest_nas[0]
#Get the exact hour number and convert it into int
extracted_hour = int(buffer_time[0])
if relationship_status == "Idle":
    if extracted_hour > max_tolerate_hour:
        print "Source path         : ",source_path
        print "Destination path    : ",dest_path
        print "Max threshold(hours): ",max_tolerate_hour
        print "Idle (hours)        : ",extracted_hour
        print "======================================================================"

else:
    pass
print "Scan completed!"

一切看起来都不错,但是当第7行的空间“准备转移”搞砸了剧本时,它就会破裂......我可以试试看&除了,但它没有解决问题。

请让我知道我还能做些什么?

2 个答案:

答案 0 :(得分:0)

您可以根据正则表达式进行拆分。这个正则表达式匹配多个空格:

>>> import re
>>> s = "machine01:/mnt/iso/Ready To Transfer            machine02:/mnt/iso/ReadyxToxTransfer                                        Transferred    19:06:10   Idle"
>>> re.split('  +', s)
['machine01:/mnt/iso/Ready To Transfer', 'machine02:/mnt/iso/ReadyxToxTransfer', 'Transferred', '19:06:10', 'Idle']

如果你的文件名有多个空格,这仍然会破坏。我建议使用更加量身定制的正则表达式:

>>> parts = re.search(r'(machine.*)(machine.*)(\s\w+)\s+([0-9:]+)\s+(\w+)', s).groups()
>>> [p.strip() for p in parts]
['machine01:/mnt/iso/Ready To Transfer', 'machine02:/mnt/iso/ReadyxToxTransfer', 'Transferred', '19:06:10', 'Idle']

编辑:正则表达式打破了“machine02:/ mnt / machine01_vlm / log-prod”,试试这个

>>> for line in input_lines.split('\n'):
...   parts = re.search(r'(^machine\d\d:.*)(machine\d\d:.*)(\s\w+)\s+([0-9:]+)\s+(\w+)', line).groups()
...   print [p.strip() for p in parts]
... 
['machine01:/mnt/vlm/log-prod', 'machine02:/mnt/machine01_vlm/log-prod', 'Transferred', '17:46:14', 'Idle']
['machine01:/mnt/vlm/log-test', 'machine02:/mnt/machine01_vlm/log-test', 'Transferred', '17:46:14', 'Idle']
['machine01:/mnt/wndchl/-', 'machine02:/mnt/machine01_wndchl/machine01_wndchl_machine01_wndchl', 'Transferred', '18:36:10', 'Idle']
['machine01:/mnt/wndchl/prod', 'machine02:/mnt/machine01_wndchl/prod', 'Transferred', '18:36:10', 'Idle']
['machine01:/mnt/wndchl/test', 'machine02:/mnt/machine01_wndchl/test', 'Transferred', '18:36:10', 'Idle']
['machine01:/mnt/iso/Archive', 'machine02:/mnt/iso/Archive', 'Transferred', '19:06:10', 'Idle']
['machine01:/mnt/iso/Ready To Transfer', 'machine02:/mnt/iso/ReadyxToxTransfer', 'Transferred', '19:06:10', 'Idle']
['machine01:/mnt/iso/-', 'machine02:/mnt/iso/iso_machine01_iso', 'Transferred', '19:06:10', 'Idle']
['machine01:/mnt/it/SCCM', 'machine02:/mnt/it/SCCM', 'Transferred', '19:25:51', 'Idle']
['machine01:/mnt/it/Windows', 'machine02:/mnt/it/Windows', 'Transferred', '19:25:51', 'Idle']
['machine01:/mnt/it/-', 'machine02:/mnt/it/machine01_it_machine01_it', 'Transferred', '19:25:51', 'Idle']
['machine01:/mnt/it/dcs', 'machine02:/mnt/it/dcs', 'Transferred', '19:25:51', 'Idle']
['machine01:/mnt/it/hds_perf_logs', 'machine02:/mnt/it/hds_perf_logs', 'Transferred', '19:25:51', 'Idle']
['machine01:/mnt/legalhold/LegalHold', 'machine02:/mnt/legalhold/LegalHold', 'Transferred', '18:46:06', 'Idle']
['machine01:/mnt/legalhold/-', 'machine02:/mnt/legalhold/legalhold_machine01_legalhold', 'Transferred', '18:46:06', 'Idle']

以下是Python re module文档

的链接

用于试验正则表达式的好工具是https://www.debuggex.com/

答案 1 :(得分:0)

import re

LOG_FMT = re.compile('(\w+):(.*?)\s+(\w+):(.*?)\s+(\w+)\s+(\d+):(\d+):(\d+)\s+(\w+)')
max_tolerate_hours = 19.2

def main():
    with open('my.log') as inf:
        for row in inf:
            match = LOG_FMT.match(row)
            if match is not None:
                src_machine, src_path, dest_machine, dest_path, rel, hh, mm, ss, status = match.groups()
                hh, mm, ss = int(hh), int(mm), int(ss)
                hours = hh + (mm / 60.) + (ss / 3600.)
                if status == 'Idle' and hours > max_tolerate_hours:
                    print('Source path         : {}'.format(src_path))
                    print('Destination path    : {}'.format(dest_path))
                    print('Max threshold (h)   : {:0.2f}'.format(max_tolerate_hours))
                    print('Idle (h)            : {:0.2f}'.format(hours))
                    print('=========================================================')
    print('Scan completed!')

if __name__=="__main__":
    main()

针对您的给定数据返回

Source path         : /mnt/it/SCCM
Destination path    : /mnt/it/SCCM
Max threshold (h)   : 19.10
Idle (h)            : 19.43
=========================================================
Source path         : /mnt/it/Windows
Destination path    : /mnt/it/Windows
Max threshold (h)   : 19.10
Idle (h)            : 19.43
=========================================================
Source path         : /mnt/it/-
Destination path    : /mnt/it/machine01_it_machine01_it
Max threshold (h)   : 19.10
Idle (h)            : 19.43
=========================================================
Source path         : /mnt/it/dcs
Destination path    : /mnt/it/dcs
Max threshold (h)   : 19.10
Idle (h)            : 19.43
=========================================================
Source path         : /mnt/it/hds_perf_logs
Destination path    : /mnt/it/hds_perf_logs
Max threshold (h)   : 19.10
Idle (h)            : 19.43
=========================================================
Scan completed!