试图为非字符串复制Python的字符串实习功能

时间:2015-10-23 00:33:24

标签: python python-2.7

对于自我项目,我想做类似的事情:

Animal(somename, 3)

正如您所看到的,我并不是每个ID都需要不止一个Species(id)实例,但每次我创建一个Animal对象时,我都会创建一个id,我可能需要多次调用,例如(a == b) == (a is b)

要解决这个问题,我要做的就是创建一个课程,以便在2个实例中,让我们说a和b,以下是永远的:

a = "hello"
b = "hello"
print(a is b)

这是Python用字符串文字做的事情,被称为实习。例如:

class MyClass(object):

    myHash = {} # This replicates the intern pool.

    def __new__(cls, n): # The default new method returns a new instance
        if n in MyClass.myHash:
            return MyClass.myHash[n]

        self = super(MyClass, cls).__new__(cls)
        self.__init(n)
        MyClass.myHash[n] = self

        return self

    # as pointed out on an answer, it's better to avoid initializating the instance 
    # with __init__, as that one's called even when returning an old instance.
    def __init(self, n): 
        self.n = n

a = MyClass(2)
b = MyClass(2)

print a is b # <<< True

该print将产生true(只要字符串足够短,如果我们直接使用python shell)。

我只能猜测CPython是如何做到这一点的(它可能涉及一些C魔法)所以我正在做我自己的版本。到目前为止,我已经:

     <select ng-change='taskTimer(duration)' ng-model='duration'>
     <option ng-repeat='TO in timeOptions'
          ng-selected='duration'>{{TO}}</option>
          {{TO}}</option>
     </select>

我的问题是:

a)我的问题是否值得解决?因为我想要的Species对象应该是非常轻的并且可以调用Animal的最大次数,而不是有限的(想象一个口袋妖怪游戏:不超过1000个实例,顶部)

b)如果是,这是解决我问题的有效方法吗?

c)如果它无效,请您详细说明一种更简单/更清洁/更Pythonic的解决方法吗?

2 个答案:

答案 0 :(得分:1)

是的,实现返回缓存对象的__new__方法是创建有限数量实例的适当方法。如果您不希望创建大量实例,则可以实现__eq__并按值而不是标识进行比较,但这样做并不会造成伤害。

请注意,不可变对象通常应在__new__而不是__init__中进行所有初始化,因为后者是在创建对象后调用的。此外,__init__将在从__new__返回的类的任何实例上调用,因此,当您进行缓存时,每次返回缓存对象时都会再次调用它。 / p>

此外,__new__的第一个参数是类对象而不是实例,因此您可能应将其命名为cls而不是self(您可以使用self代替如果你愿意,可以在方法的后面加instance!)。

答案 1 :(得分:1)

为了使这一点尽可能通用,我将推荐一些东西。一,如果你愿意,可以继承namedtuple&#34; true&#34;不变性(通常人们会对此不屑一顾,但是当你进行实习时,打破不可变的不变量会导致更大的问题)。其次,使用锁来允许线程安全行为。

因为这相当复杂,我将提供Species代码的修改后的副本以及解释它的注释:

import collections
import operator
import threading

# Inheriting from a namedtuple is a convenient way to get immutability
class Species(collections.namedtuple('SpeciesBase', 'species_id height ...')):
    __slots__ = ()  # Prevent creation of arbitrary values on instances; true immutability of declared values from namedtuple makes true immutable instances

    # Lock and cache, with underscore prefixes to indicate they're internal details
    _cache_lock = threading.Lock()
    _cache = {} 

    def __new__(cls, species_id):  # Switching to canonical name cls for class type
        # Do quick fail fast check that ID is in fact an int/long
        # If it's int-like, this will force conversion to true int/long
        # and minimize risk of incompatible hash/equality checks in dict
        # lookup
        # I suspect that in CPython, this would actually remove the need
        # for the _cache_lock due to the GIL protecting you at the
        # critical stages (because no byte code is executing comparing
        # or hashing built-in int/long types), but the lock is a good idea
        # for correctness (avoiding reliance on implementation details)
        # and should cost little
        species_id = operator.index(species_id)

        # Lock when checking/mutating cache to make it thread safe
        try:
            with cls._cache_lock:
                return cls._cache[species_id]
        except KeyError:
            pass

        # Read in data here; not done under lock on assumption this might
        # be expensive and other Species (that already exist) might be
        # created/retrieved from cache during this time
        species_id = ...
        height = ...
        # Pass all the values read to the superclass (the namedtuple base)
        # constructor (which will set them and leave them immutable thereafter)
        self = super(Species, cls).__new__(cls, species_id, height, ...)

        with cls._cache_lock:
            # If someone tried to create the same species and raced
            # ahead of us, use their version, not ours to ensure uniqueness
            # If no one raced us, this will put our new object in the cache
            self = cls._cache.setdefault(species_id, self)
        return self

如果你想为普通图书馆实习(用户可能会被线程化,你不能相信他们不会破坏不变性不变),像上面这样的东西就是一个基本的结构。它很快,即使构造是重量级的,也可以最大限度地减少停顿的机会(换取可能不止一次重建一个对象,如果许多线程试图一次第一次构建它,则丢弃除一个副本之外的所有对象),等

当然,如果构造很便宜并且实例很小,那么只需写一个__eq__(如果它在逻辑上是不可变的话,可能__hash__并完成它。