高效的Python数据存储(抽象数据类型?)

时间:2009-09-08 20:42:32

标签: python data-structures

请原谅标题中含糊不清的内容 - 我不太确定如何表达我的问题。

给出一个字符串:

blah = "There are three cats in the hat"

和(我不太确定使用哪种数据结构)“userInfo”:

cats -> ("tim", "1 infinite loop")
three -> ("sally", "123 fake st")
three -> ("tim", "1 infinite loop")
three cats -> ("john", "123 fake st")
four cats -> ("albert", "345 real road")
dogs -> ("tim", "1 infinite loop")
cats hat -> ("janet", NULL)

正确的输出应该是:

tim (since 'cats' exists)
sally (since 'three' exists)
tim (since 'three' exists)
john (since both 'three' and 'cats' exist)
janet (since both 'cats' and 'hat' exist somewhere in the string blah)

我想要一种有效的方式来存储这些数据。有可能匹配多个“三个”字符串(即150个人将拥有该字符串。)我是否应该拥有包含所有这些数据的列表并复制“密钥”?

3 个答案:

答案 0 :(得分:6)

这样的东西?

class Content( object ):
    def __init__( self, content, maps_to ):
        self.content= content.split()
        self.maps_to = maps_to
    def matches( self, words ):
        return all( c in words for c in self.content )
    def __str__( self ):
        return "%s -> %r" % ( " ".join(self.content), self.maps_to )

rules = [
    Content('cats',("tim", "1 infinite loop")),
    Content('three',("sally", "123 fake st")),
    Content('three',("tim", "1 infinite loop")),
    Content('three cats',("john", "123 fake st")),
    Content('four cats',("albert", "345 real road")),
    Content('dogs',("tim", "1 infinite loop")),
    Content('cats hat', ("janet", None)),
]

blah = "There are three cats in the hat"

for r in rules:
    if r.matches(blah.split()):
        print r

输出

cats -> ('tim', '1 infinite loop')
three -> ('sally', '123 fake st')
three -> ('tim', '1 infinite loop')
three cats -> ('john', '123 fake st')
cats hat -> ('janet', None)

答案 1 :(得分:1)

我对你实际上要做的事情没有丝毫的线索,但是如果你有很多数据,而你需要存储它,而你需要在其中搜索,那么某种数据库会带有索引能力似乎是要走的路。

ZODB,CouchBD或SQL是一个品味问题。我非常怀疑你需要关心磁盘空间的效率以及搜索和查找的速度。

答案 2 :(得分:0)

我不确定你到底要做什么,但也许你正在寻找这样的东西:

userinfo = {
  "tim": "1 infinite loop",
  "sally": "123 fake st",
  "john": "123 fake st",
  "albert": "345 real road",
  "janet": None
}

conditions = {
  "cats": ["tim"],
  "three": ["sally", "tim"],
  "three cats": ["john"],
  "four cats": ["albert"],
  "dogs": ["tim"],
  "cats hat": ["janet"]
}

for c in conditions:
  if all_words_are_in_the_sentence(c):
    for p in conditions[c]:
      print p, "because of", c
      print "additional info:", userinfo[p]