逐行读取文件,并获取以某个单词开头的行?

时间:2014-09-11 09:24:04

标签: python json

我正在使用python并尝试逐行读取文件并在JSON中添加这些行,但我需要检查该行是否以某个单词开头,然后在该单词后面将文本放入json直到找到该行再次从特定单词开始,

我有一系列这些特定名称:

names_array= ['Filan Fisteku','Fisteku Filan']

所以例如txt文件就像:

  1. Filan Fisteku:说了些什么,blla blla blla然后
  2. Filan Fisteku演讲继续下一行,加上其他一些文字。
  3. Fisteku Filan:这是我试图放入json的另一个人的另一个文字。
  4. 所以我想用这个txt做的json是:

    {
    "Filan Fisteku":["Said something about this , blla blla blla",
                      "then the Filan Fisteku speech goes on on the next line,",
                      "plus some other text."],
    "Fisteku Filan":["This is another text from another guy which",
                     "i am trying to put in a json"]
    }
    

    我需要知道我是否可以通过递归来完成此操作,或者我该怎么做?

3 个答案:

答案 0 :(得分:1)

您可以轻松完成此操作:

res = {}
with open('file.txt', 'r') as f:
    for line in f.readlines():
        for name in names_array:
            if line.startswith(name):
                if name not in res:
                    res[name] = [line]
                else:
                    res[name].append(line)

也许您还需要在行的开头删除多余的字符(空格等),但可能不需要。

答案 1 :(得分:1)

您可以使用以下内容构建dict

names = {}
with open('yourfile') as fin:
    lines = (line.strip().partition(': ') for line in fin)
    for fst, sep, snd in lines:
        if sep: 
            name = fst
        names.setdefault(name, []).append(snd or fst)

给出了:

{'Filan Fisteku': ['Said something about this , blla blla blla then',
                   'the Filan Fisteku speech goes on on the next line,  plus some other text.'],
 'Fisteku Filan': ['This is another text from another guy which i am trying to put in a json.']}

然后json.dumps names

答案 2 :(得分:0)

您可以使用标记来识别当前发言人。如果您在一行开头遇到新的扬声器,请更新标志。如果线路起始处没有扬声器,则线路将进入当前扬声器阵列。我已经创建了一个演示,检查它是否适合你,

speaker = ''
Filan_Fisteku = []
Fisteku_Filan = []
with open('yourfile.txt', 'r') as f:
    for line in f.readlines():
        if line.startswith('Filan Fisteku:'):
            line = line.lstrip('Filan Fisteku:')
            Filan_Fisteku.append(line.strip())
            speaker = 'Filan Fisteku'
        elif line.startswith('Fisteku Filan:'):
            line = line.lstrip('Fisteku Filan:')
            Fisteku_Filan.append(line.strip())
            speaker = 'Fisteku Filan'
        elif speaker == 'Filan Fisteku':
            Filan_Fisteku.append(line.strip())
        elif speaker == 'Fisteku Filan':
            Fisteku_Filan.append(line.strip())
mydict = {'Filan Fisteku': Filan_Fisteku, 'Fisteku Filan': Fisteku_Filan}

Frome数据,mydict将如下所示,

{'Filan Fisteku': ['Said something about this , blla blla blla then',
               'the Filan Fisteku speech goes on on the next line, plus some other text.',
               'plus some other text.'],
 'Fisteku Filan': ['This is another text from another guy which',
               'i am trying to put in a json.']}