Question

我已经搜索过，但仍然没有线索，所以请耐心等待。

我有字符串，每个字符串对应一个特定的特征矩阵。例子：

'a' = [-vegetable, +fruit, +apple, -orange]
'o' = [-vegetable, +fruit, -apple, +orange]
't' = [+vegetable, -fruit, -apple, -orange]

请注意，这只是我选择代表矩阵的符号。

我希望能够做的是获取任意数量的此类字符串，并根据一些真值函数对它们进行评估。因此，评估字符串'aoaot'：

[+fruit] => [+apple]
equivalently: (not [+fruit]) or [+apple]

应返回给定字符串的此含义为false的次数。这样的事情：

[True, False, True, False, True]

或者对False的评估数的绝对计数，例如2在这里。在python中这样做的明智之举是什么？我正在研究NLTK，但我不确定。

Answer 1

您可以使用set类型实现必要的逻辑。

m = {
    'a':set(['fruit', 'apple']),
    'o':set(['fruit', 'orange']),
    't':set(['vegetable'])
}

pred = lambda f: ('fruit' in f) <= ('apple' in f)

# True/False array
[ pred(m[f]) for f in 'aoaot' ]

# Number of falses
sum( not pred(m[f]) for f in 'aoaot' )

Answer 2

如果你想在自己的语法中有更多的灵活性，这里有一个简单的解析器，用于你给出的数据定义：

data = """\
a = [-vegetable, +fruit, +apple, -orange, -citrus] 
o = [-vegetable, +fruit, -apple, +orange, +citrus] 
t = [+vegetable, -fruit]"""

from pyparsing import Word, alphas, oneOf, Group, delimitedList

# a basic token for a word of alpha characters plus underscores
ident = Word(alphas + '_')

# define a token for leading '+' or '-', with parse action to convert to bool value
inclFlag = oneOf('+ -')
inclFlag.setParseAction(lambda t: t[0] == '+')

# define a feature as the combination of an inclFlag and a feature name
feature = Group(inclFlag('has') + ident('feature'))

# define a definition
defn = ident('name') + '=' + '[' + delimitedList(feature)('features') + ']'

# search through the input test data for defns, and print out the parsed data
# by name, and the associated features
defns = defn.searchString(data)
for d in defns:
    print d.dump()
    for f in d.features:
        print f.dump('  ')
    print

打印：

['a', '=', '[', [False, 'vegetable'], [True, 'fruit'], [True, 'apple'], [False, 'orange'], [False, 'citrus'], ']']
- features: [[False, 'vegetable'], [True, 'fruit'], [True, 'apple'], [False, 'orange'], [False, 'citrus']]
- name: a
  [False, 'vegetable']
  - feature: vegetable
  - has: False
  [True, 'fruit']
  - feature: fruit
  - has: True
  [True, 'apple']
  - feature: apple
  - has: True
  [False, 'orange']
  - feature: orange
  - has: False
  [False, 'citrus']
  - feature: citrus
  - has: False

['o', '=', '[', [False, 'vegetable'], [True, 'fruit'], [False, 'apple'], [True, 'orange'], [True, 'citrus'], ']']
- features: [[False, 'vegetable'], [True, 'fruit'], [False, 'apple'], [True, 'orange'], [True, 'citrus']]
- name: o
  [False, 'vegetable']
  - feature: vegetable
  - has: False
  [True, 'fruit']
  - feature: fruit
  - has: True
  [False, 'apple']
  - feature: apple
  - has: False
  [True, 'orange']
  - feature: orange
  - has: True
  [True, 'citrus']
  - feature: citrus
  - has: True

['t', '=', '[', [True, 'vegetable'], [False, 'fruit'], ']']
- features: [[True, 'vegetable'], [False, 'fruit']]
- name: t
  [True, 'vegetable']
  - feature: vegetable
  - has: True
  [False, 'fruit']
  - feature: fruit
  - has: False

Pyparsing会为您完成大量的开销，例如迭代输入字符串，跳过不相关的空格，并使用命名属性返回已解析的数据。查看pyparsing wiki（SimpleBool.py）中的布尔评估器，或更完整的布尔评估器包booleano。

根据布尔真值函数评估字符串

2 个答案: