Question

我有这个txt文件，它是linux系统中etc目录的ls -R。示例文件：

etc:  
ArchiveSEL  
xinetd.d

etc/cmm:  
CMM_5085.bin  
cmm_sel  
storage.cfg  

etc/crontabs:  
root

etc/pam.d:  
ftp    
rsh  

etc/rc.d:  
eth.set.sh  
rc.sysinit  

etc/rc.d/init.d:  
cmm  
functions  
userScripts  

etc/security:  
access.conf  
console.apps  
time.conf

etc/security/console.apps:  
kbdrate

etc/ssh:  
ssh_host_dsa_key  
sshd_config  

etc/var:  
setUser  
snmpd.conf

etc/xinetd.d:  
irsh  
wu-ftpd

我想将子目录拆分为多个文件。示例文件将是这样的：etct，etcCmm.txt，etcCrontabs.txt，etcPamd.txt，...
有人可以给我一个可以做到这一点的python代码吗？请注意，子目录行以'：'结尾，但我不够聪明，无法编写代码。一些例子将不胜感激。谢谢你:)）

Answer 1

也许是这样的？ re.M生成一个可以匹配多行的多行正则表达式，最后一部分只是迭代匹配并创建文件......

import re

data = '<your input data as above>' # or open('data.txt').read()
results = map(lambda m: (m[0], m[1].strip().splitlines()),
    re.findall('^([^\n]+):\n((?:[^\n]+\n)*)\n', data, re.M))

for dirname, files in results:
    f = open(dirname.replace('/', '')+'.txt', 'w')
    for line in files:
        f.write(line + '\n')
    f.close()

Answer 2

您需要逐行完成。如果是line.endswith(":")那么你在一个新的子目录中。从那时起，每一行都是您子目录中的新条目，直到另一行以:结尾。

根据我的理解，你只想将一个文本文件拆分成几个含糊不清的文本文件。

因此，您会看到一行是否以:结尾。然后你打开一个新的文本文件，比如etcCmm.txt，以及你从源文本中读取的每一行，从那时起，你写入etcCmm.txt。当您遇到以:结尾的另一行时，关闭先前打开的文件，创建一个新文件，然后继续。

我要留下一些让你自己做的事情，比如找出要调用文本文件的内容，逐行读取文件等等。

Answer 3

使用像'。*：'这样的正则表达式。
使用file.readline（）。
使用循环。

Answer 4

如果Python不是必须的，你可以使用这个内容

awk '/:$/{gsub(/:|\//,"");fn=$0}{print $0 > fn".txt"}' file

Answer 5

这就是我要做的事情：

将文件读入内存（myfile = open(filename).read()应该这样做。）

然后沿着分隔符分割文件：

import re
myregex = re.compile(r"^(.*):[ \t]*$", re.MULTILINE)
arr = myregex.split(myfile)[1:] # dropping everything before the first directory entry

然后将数组转换为dict，沿途删除不需要的字符：

mydict = dict([(re.sub(r"\W+","",k), v.strip()) for (k,v) in zip(arr[::2], arr[1::2])])

然后写下文件：

for name,content in mydict.iteritems():
    output = open(name+".txt","w")
    output.write(content)
    output.close()

Python：我如何拆分文件？

5 个答案: