使用python通过带分隔符的文本文件进行解析

时间:2016-04-14 20:15:58

标签: python parsing

我需要解析一个看起来像这样的文本文件:

"id"$"date"$"text"

  10001$2016-01-11$"[start]
  this is some text
  [stop]
  "
  10002$2014-03-12$"[start]
  this is some more text
  [stop]
  "

将Python放入库中并将这三个不同的元素(id,date和text)作为键。

我不确定如何使用分隔符拆分这些元素以及如何将第一行用作列表中所有元素的键。

这样的事情甚至可以打印出来:

infile = open('filename.txt', 'r')
for line in infile:
    if "????" in line:
        print(line, next(infile))

如果我尝试:

infile = open('filename.txt', 'r')
   for line in infile:
    if '"text"' in line:
            print(next(infile)) 

它只打印第一行。

理想情况下如下:

[{'id':'10001', 'date':'2016-01-11', 'text':'this is some text'},{'id':'10002', 'date':'2014-03-12', 'text':'this is some more text'}]

3 个答案:

答案 0 :(得分:0)

import csv
with open(path,'rb') as f:
    reader = csv.reader(f,delimiter='$')
    res = [ {'id':line[0],'date':line[1],'text':line[2]} for line in reader ]
    res = res[1:]

答案 1 :(得分:0)

您可以使用python的内置csv库来解析文件。

import csv


class Parser(object):
    START_TEXT = "[start]"
    END_TEXT = "[stop]"

    def __init__(self, filename):
        self.filename = filename


    def parse_file(self):
        elements = []

        with open(self.filename, 'r') as f:
            reader = csv.reader(f, delimiter='$')
            first_row = next(reader)

            key0 = first_row[0]
            key1 = first_row[1]
            key2 = first_row[2]

            for row in reader:
                elements.append({
                    key0: row[0],
                    key1: row[1],
                    key2: self.parse_text(row[2]),
                })

        return elements

    @classmethod
    def parse_text(cls, text):
        start_idx = text.index(cls.START_TEXT)
        end_idx = text.index(cls.END_TEXT)

        new_txt = text[start_idx + len(cls.START_TEXT):][:end_idx - len(cls.END_TEXT) - 1]

        return new_txt.lstrip('\n').rstrip('\n')


p = Parser("infile.txt")
elements = p.parse_file()

print elements

输出:

[{'date': '2016-01-11', 'text': 'this is some text', 'id': '10001'}, {'date': '2014-03-12', 'text': 'this is some more text', 'id': '10002'}]

答案 2 :(得分:0)

import csv

with open('f.txt') as fp:
    reader = csv.DictReader(fp, delimiter="$")
    data = list(reader)

for row in data:
    row.update({
        k:v.replace('[start]','').replace('[stop]','').replace('\n','')
        for k,v in row.items()})

print data