Question

我正在解析一个没有分隔符但确实有字段开始和停止的特定索引的字符串。这是我的列表理解，从字符串生成一个列表：

field_breaks = [(0,2), (2,10), (10,13), (13, 21), (21, 32), (32, 43), (43, 51), (51, 54), (54, 55), (55, 57), (57, 61), (61, 63), (63, 113), (113, 163), (163, 213), (213, 238), (238, 240), (240, 250), (250, 300)]
s = '4100100297LICACTIVE  09-JUN-198131-DEC-2010P0         Y12490227WYVERN RESTAURANTS INC                            1351 HEALDSBURG AVE                                                                                 HEALDSBURG               CA95448     ROUND TABLE PIZZA                                 575 W COLLEGE AVE                                 STE 201                                           SANTA ROSA               CA95401               '
data = [s[x[0]:x[1]].strip() for x in field_breaks]

有关如何改善此事的任何建议吗？

Answer 1

您可以通过执行以下操作将field_breaks列表减半：

field_breaks = [0, 2, 10, 13, 21, 32, 43, ..., 250, 300]
s = ...
data = [s[x[0]:x[1]].strip() for x in zip(field_breaks[:-1], field_breaks[1:])]

Answer 2

您可以使用元组解包来获得更干净的代码：

data = [s[a:b].strip() for a,b in field_breaks]

Answer 3

老实说，我没有发现逐列解析方法非常易读，我质疑它的可维护性（由一个错误之类的错误等）。虽然我确信列表推导在这种情况下是非常有效和高效的，并且建议的基于zip的解决方案有一个很好的功能调整。

相反，我将在左侧字段中输出垒球，因为列表推导应该部分是为了使您的代码更具说明性。对于完全不同的内容，请考虑基于pyparsing模块的以下方法：

def Fixed(chars, width):
    return Word(chars, exact=width)

myDate = Combine(Fixed(nums,2) + Literal('-') + Fixed(alphas,3) + Literal('-')
                 + Fixed(nums,4))

fullRow = Fixed(nums,2) + Fixed(nums,8) + Fixed(alphas,3) + Fixed(alphas,8)
          + myDate + myDate + ...

data = fullRow.parseString(s)
# should be ['41', '00100297', 'LIC', 'ACTIVE  ', 
#            '09-JUN-1981', '31-DEC-2010', ...]

为了使这更具说明性，您可以在遇到它们时为每个字段命名。我不知道这些字段到底是什么，但是像：

someId = Fixed(nums,2)
someOtherId = Fixed(nums,8)
recordType = Fixed(alphas,3)
recordStatus = Fixed(alphas,8)
birthDate = myDate
issueDate = myDate
fullRow = someId + someOtherId + recordType + recordStatus
          + birthDate + issueDate + ...

现在这样的方法可能不会打破任何陆地速度记录。但是，圣牛，难道你不觉得这更容易阅读和维护吗？

Answer 4

以下是使用map

的方法

data = map(s.__getslice__, *zip(*field_breaks))

写这个列表理解的更好方法是什么？

4 个答案: