以模块化方式编写文件解析器函数的测试

时间:2016-01-06 02:21:16

标签: python file unit-testing testing automated-tests

我正在编写一段python代码,它将格式化文件解析为python对象。该文件可能会有所不同,但是现在我的工作基于文件的一部分,并希望测试可以帮助我扩展所有这些文件。

文件本身包含一个包含元数据的标题,后跟几个数据块。

[general header, describes length of header 1 & header 2]
[header describing data block 1]
[header describing data block 2]
[data block 1]
[data block 2]

目前我的代码按以下方式列出

with datafile as open(filename, 'r'):
    gen_header_obj = parse_gen_header(datafile)
    header1_obj = parse_header1(datafile, gen_header_obj.header1_len)
    header2_obj = parse_header2(datafile, gen_header_obj.header2_len)
    data1_obj = parse_data1(datafile, header1_obj.datalen)
    data2_obj = parse_data2(datafile, header2_obj.datalen)

每个解析*(文件)函数多次调用file.readline(),具体取决于指定数据长度的大小。

理想情况下,我至少会进行5次单独的测试,我会提供文件的假部分并查看是否正确获取了信息。除了这种情况,数据部分非常大(兆字节)。

是否可以编写类似以下的测试?

class TestParser(unittest.TestCase)
    filename = 'locally_stored_file.txt'

    def setUp(self):
        self.file = open(filename, 'r')

    def tearDown(self):
        self.file.close()

    def test_gen_header_parse(self):
        result = parse_gen_header(datafile)
        self.header1_len = result.header1_len
        self.header2_len = result.header2_len
        expected = ...
        assertIsEqual(result, expected)

    def test_header1_parse(self):
        # datafile.seek() is left of from test_gen_header_parse
        result = parse_header1(datafile, self.header1_len)
        self.data1_len = result.data1_len
        expected = ...
        assertIsEqual(result, expected)

    def test_header2_parse(self):
        # datafile.seek() is left of from test_header1_parse
        result = parse_header2(datafile, self.header2_len)
        self.data2_len = result.data2_len
        expected = ...
        assertIsEqual(result, expected)

    def test_data1_parse(self):
        # datafile.seek() is left of from test_header2_parse
        result = parse_data1(datafile, self.data1_len)
        expected = ...
        assertIsEqual(result, expected)

    def test_data2_parse(self):
        # datafile.seek() is left of from test_data1_parse
        result = parse_data2(datafile, self.data2_len)
        expected = ...
        assertIsEqual(result, expected)

    # Some code to force the tests to run sequentially as laid out above

正如你所看到的那样,我正在尝试编写五个单独的测试,如果将来出现问题,这些测试有望失败。但是,如果事先未运行parse_header2parse_gen_header,我将无法测试parse_header1

不确定是否有更好的方法来解决这个问题。

1 个答案:

答案 0 :(得分:2)

如果你事先声明所有长度并使用seek函数为每个测试适当地移动文件指针会更好。您还可以使用预期的长度并测试那些

class TestParser(unittest.TestCase)
    filename = 'locally_stored_file.txt'
    expected_gen_header_length = 42 # The correct number it should be
    expected_header1_length = 42 # The correct number it should be
    # and lengths of the other things

    def test_gen_header_parse(self):
        with datafile = open(filename, 'r'):
            result, len_header = parse_gen_header(datafile) # output len_header if you want to do an assert for it
            self.header1_len = result.header1_len
            self.header2_len = result.header2_len
            expected = ...
            assertIsEqual(self.expected_gen_header_length, len_gen_header)
            assertIsEqual(result, expected)

    def test_header1_parse(self):
        with datafile = open(filename, 'r'):
            # Force datafile.seek() to begin after gen_header
            datafile.seek(self.expected_gen_header_length)

            result = parse_header1(datafile, self.expected_header1_length)
            self.data1_len = result.data1_len
            expected = ...
            assertIsEqual(result, expected)

    # and so on ....