如何在Python中使用分组子文本文件?

时间:2018-10-03 19:39:49

标签: python

我只是从python开始。我无法解决以下问题。任何帮助是极大的赞赏。谢谢。

我有一个txt文件,如下所示:

> #code1 information about code here abcdefghijklmnopqrst  information line continures #code2 information about code here xyz #code3
> information about code here klm #code4 details found here, information
> {}}} information and details continued #code5....

我希望我的输出是2个txt文件。一个行与代码1有关的所有行和1与代码4有关的所有行。

   with open("C:\\Users\\name\\Desktop\\Codes.txt","r") as f:
    d = {}


    for line in f:
        start = "#code"

    code,number  = line.strip().split(start)
    if d.has_key(number):
        d[number].append(code)

    else:
        d[number] = []
        d[number].append(code)


for key,value in d.iteritems():
    f = open("C:\\Users\\name\\Desktop\\New folder\{}.txt".format(number), "w")
for item in value:    
    f.write("{}\n".format(item))
    f.close()

我不确定如何(1)将与渗漏到新行的代码有关的所有行分组,以及(2)如何仅选择两个代码(代码1和代码4)写入新文件。

2 个答案:

答案 0 :(得分:1)

您可以这样做:

import re
code_dict = dict()

f = open("C:\\Users\\name\\Desktop\\Codes.txt","r")
code = f.read()
f.close()

code_to_retain = ['#code1', '#code4']
key_word = None
for word in code.split(' '):
    if word in code_to_retain:
        code_dict[word] = list()
        key_word = word
        continue
    elif re.search('#code\d+', word):
        key_word = None
        continue

    if key_word:
        code_dict[key_word].append(word)

for key_word in code_dict.keys():
    lines = ' '.join(code_dict[key_word])

    #f = open("C:\\Users\\name\\Desktop\\New folder\{}.txt".format(key_word.replace('#', '')), "w")
    f = open('/tmp/{}.txt'.format(key_word.replace('#', '')), "w")
    f.write(lines)
    f.close()

输出:

cat /tmp/code1.txt:

information about code here abcdefghijklmnopqrst  information line continures

cat /tmp/code4.txt:

details found here, information
> {}}} information and details continued

答案 1 :(得分:1)

最简单的方法是直接写入输出文件,而不是创建临时列表和字典。

您还希望在执行此操作时确保去除随机换行符。

我用上面的几个文本制作了一个文件,并使用以下代码对其进行了测试:

#output for items labeled "#code1"
code1out = open('code1.txt', 'w')
#output for items labeled "#code4"
code4out = open('code4.txt', 'w')
#open our codes.txt file
with open('codes.txt') as f:
    #create a list of strings that splits on the hash/pound symbol
    lines = f.read().split('#')
    #iterate through our list of codes
    for item in lines:
        #get rid of line breaks in our list
        item = item.replace('\n', '')
        #split each item after the first word (i.e., "code1", "code2", followed by the rest of the string)
        wholelinesplit = item.split(' ', 1)
        #check if the first word is "code1" or "code4", and if so, print to the appropriate file, with a line break at the end of the string
        if wholelinesplit[0] == 'code1':
            code1out.write(wholelinesplit[1] + '\n')
        elif wholelinesplit[0] == 'code4':
            code4out.write(wholelinesplit[1] + '\n')

这是code1.txt中的输出:

information about code here abcdefghijklmnopqrst  information line continures 
information about code here abcdefghijklmnopqrst  information line continures 
information about code here abcdefghijklmnopqrst  information line continures 
information about code here abcdefghijklmnopqrst  information line continures 

这是code4.text中的输出:

details found here, information> {}}} information and details continued 
details found here, information> {}}} information and details continued 
details found here, information> {}}} information and details continued 
details found here, information> {}}} information and details continued