我有一个这样的字符串:
out = """Tue Nov 7 07:20:56.948 UTC
total 1224
11 -rw------- 1 243 Nov 7 06:50 .bash_history
12 -rw-r--r-- 1 364236 Nov 5 12:24 bh_cardmgr_1899.by.11.20171105-122102.sysadmin-vm.ec937.core.gz
15 -rw-r--r-- 1 42082 Nov 5 13:03 bh_cardmgr_1900.by.11.20171105-130244.sysadmin-vm.ec937.core.txt
14 -rw-r--r-- 1 365799 Nov 5 13:03 bh_cardmgr_1900.by.11.20171105-130244.sysadmin-vm.ec937.core.gz
16 -rw-r--r-- 1 366337 Nov 7 04:58 bh_cardmgr_1887.by.11.20171107-045835.sysadmin-vm.ec937.core.gz
131074 drwxr-xr-x 4 4096 Nov 7 05:27 cisco_support
131073 drwxr-xr-x 2 4096 Nov 5 09:22 tftpboot
13 -rw-r--r-- 1 42082 Nov 5 12:24 bh_cardmgr_1899.by.11.20171105-122102.sysadmin-vm.ec937.core.txt
131077 drwxr-xr-x 2 4096 Nov 5 09:31 dumper
17 -rw-r--r-- 1 42082 Nov 7 04:58 bh_cardmgr_1887.by.11.20171107-045835.sysadmin-vm.ec937.core.txt
"""
我只想从此字符串中获取文件名并创建这些文件名的列表。我如何获得这样的输出?
list = [bh_cardmgr_1899.by.11.20171105-122102.sysadmin-vm.ec937.core.gz, bh_cardmgr_1900.by.11.20171105-130244.sysadmin-vm.ec937.core.txt, bh_cardmgr_1900.by.11.20171105-130244.sysadmin-vm.ec937.core.gz]
答案 0 :(得分:1)
根据您的要求,正则表达式是解决问题的简单方法。 正则表达式非常强大,你可以通过google来学习它!
import re
out = """Tue Nov 7 07:20:56.948 UTC
total 1224
11 -rw------- 1 243 Nov 7 06:50 .bash_history
12 -rw-r--r-- 1 364236 Nov 5 12:24 bh_cardmgr_1899.by.11.20171105-122102.sysadmin-vm.ec937.core.gz
15 -rw-r--r-- 1 42082 Nov 5 13:03 bh_cardmgr_1900.by.11.20171105-130244.sysadmin-vm.ec937.core.txt
14 -rw-r--r-- 1 365799 Nov 5 13:03 bh_cardmgr_1900.by.11.20171105-130244.sysadmin-vm.ec937.core.gz
16 -rw-r--r-- 1 366337 Nov 7 04:58 bh_cardmgr_1887.by.11.20171107-045835.sysadmin-vm.ec937.core.gz
131074 drwxr-xr-x 4 4096 Nov 7 05:27 cisco_support
131073 drwxr-xr-x 2 4096 Nov 5 09:22 tftpboot
13 -rw-r--r-- 1 42082 Nov 5 12:24 bh_cardmgr_1899.by.11.20171105-122102.sysadmin-vm.ec937.core.txt
131077 drwxr-xr-x 2 4096 Nov 5 09:31 dumper
17 -rw-r--r-- 1 42082 Nov 7 04:58 bh_cardmgr_1887.by.11.20171107-045835.sysadmin-vm.ec937.core.txt
"""
def Func(data):
output = list()
data_lines = data.split('\n')
p = re.compile(r'\s*\d+ -[-rwx]+ \d+ \s*\d+ \w{3}\s*\d+ \d{2}:\d{2} (.*)')
for x in data_lines:
result = p.match(x)
if result:
output.append(result.groups(1)[0])
return output
print(Func(out))
结果将是:
['.bash_history',
'bh_cardmgr_1899.by.11.20171105-122102.sysadmin-vm.ec937.core.gz',
'bh_cardmgr_1900.by.11.20171105-130244.sysadmin-vm.ec937.core.txt',
'bh_cardmgr_1900.by.11.20171105-130244.sysadmin-vm.ec937.core.gz',
'bh_cardmgr_1887.by.11.20171107-045835.sysadmin-vm.ec937.core.gz',
'bh_cardmgr_1899.by.11.20171105-122102.sysadmin-vm.ec937.core.txt',
'bh_cardmgr_1887.by.11.20171107-045835.sysadmin-vm.ec937.core.txt']
答案 1 :(得分:0)
如果文件格式一致,您可以使用:
file_list = []
for line in out.split('\n'):
if len(line.split()) == 8:
file_list.append(line.split()[7])
给出了:
['.bash_history',
'bh_cardmgr_1899.by.11.20171105-122102.sysadmin-vm.ec937.core.gz',
'bh_cardmgr_1900.by.11.20171105-130244.sysadmin-vm.ec937.core.txt',
'bh_cardmgr_1900.by.11.20171105-130244.sysadmin-vm.ec937.core.gz',
'bh_cardmgr_1887.by.11.20171107-045835.sysadmin-vm.ec937.core.gz',
'cisco_support',
'tftpboot',
'bh_cardmgr_1899.by.11.20171105-122102.sysadmin-vm.ec937.core.txt',
'dumper',
'bh_cardmgr_1887.by.11.20171107-045835.sysadmin-vm.ec937.core.txt']
答案 2 :(得分:0)
您可以使用Positive Lookbehind (?<=\d{2}:\d{2}\s)
import re
pattern=r'(?<=\d{2}:\d{2}\s)\w.+'
print(re.findall(pattern,out))
输出:
['bh_cardmgr_1899.by.11.20171105-122102.sysadmin-vm.ec937.core.gz', 'bh_cardmgr_1900.by.11.20171105-130244.sysadmin-vm.ec937.core.txt', 'bh_cardmgr_1900.by.11.20171105-130244.sysadmin-vm.ec937.core.gz', 'bh_cardmgr_1887.by.11.20171107-045835.sysadmin-vm.ec937.core.gz', 'cisco_support', 'tftpboot', 'bh_cardmgr_1899.by.11.20171105-122102.sysadmin-vm.ec937.core.txt', 'dumper', 'bh_cardmgr_1887.by.11.20171107-045835.sysadmin-vm.ec937.core.txt']