基于具有嵌套条件的字符串过滤行

时间:2016-05-03 14:39:16

标签: python functional-programming nested

我有来自csv文件的数据行,存储的第一行是标题。例如:

first line -> [a,b,c,d,e]

second line -> [0,1,2,1,2]

third line -> [4,2,4,1,5]

另外,我有以下格式的数据相关条件字符串:

condition =(((a = d)OR(a = c))AND(c

输出应仅为第3行。如何评估此条件,并分离所有嵌套的子条件?我正在考虑一个递归函数,通过括号读取,但我的代码中有一个混乱:(。感谢答案,对不起我的英语不好!

PS:我不想使用pandas或csv库 PS2:上述条件只是一个例子,可能有另外更多的嵌套条件,如((((((a = d)AND(c> e))OR(b = c))AND(e

2 个答案:

答案 0 :(得分:0)

以下是一个快速解决方案:

from itertools import ifilter

with open('input.csv', 'r') as fi:
    lines = ((rawline, map(int, rawline.split(','))) for rawline in  fi.readlines()[1:])
    results = ifilter(lambda (_, fds): (fds[0] == fds[3] or fds[0] == fds[2]) and (fds[2] < fds[4]), lines)
    for (rawline, _) in results:
        print rawline

input.csv为:

a,b,c,d,e
0,1,2,1,2
4,2,4,1,5

结果输出为:

4,2,4,1,5

更新:更短/更紧凑的实施:

from itertools import ifilter

with open('input.csv', 'r') as fi:
    results = ifilter(
        lambda fds: (fds[0] == fds[3] or fds[0] == fds[2]) and (fds[2] < fds[4]),
        (map(int, rawline.split(',')) for rawline in fi.readlines()[1:]))
    for fields in results:
        print ','.join(map(str,fields))

答案 1 :(得分:0)

最好在需求更新/更改时创建新答案

评估字符串格式条件的最快解决方案是使用内置函数eval。这样,您就不必进行繁重/无法负担的解析(lexical analysissyntactic analysis

以下是示例代码:

from itertools import ifilter

condition1 = '(((a = d) OR (a = c)) AND (c < e))'

def evalCondition(condition, *args):
    '''
    1) if you have condition format follow python grammar, then you don't need below replacement
    2) assume there is no '>=' or '<=', otherwise, you have to use more sophisticated replacement method e.g. using regular exppression
    '''
    condition = condition.replace('=', '==').replace('OR', 'or').replace('AND', 'and')

    a,b,c,d,e = args
    return eval(condition)

with open('input.csv', 'r') as fi:
    results = ifilter(
        lambda fields: evalCondition(condition1, *fields),
        (map(int, rawline.split(',')) for rawline in fi.readlines()[1:]))
    for fields in results:
        print ','.join(map(str,fields))

input.csv为:

a,b,c,d,e
0,1,2,1,2
4,2,4,1,5

结果输出为:

4,2,4,1,5