我在一个使用嵌套可迭代对象的地方进行了测试(通过 nested iterable 我的意思是只将Iterables作为元素进行迭代)。
作为测试级联考虑
from itertools import tee
from typing import (Any,
Iterable)
def foo(nested_iterable: Iterable[Iterable[Any]]) -> Any:
...
def test_foo(nested_iterable: Iterable[Iterable[Any]]) -> None:
original, target = tee(nested_iterable) # this doesn't copy iterators elements
result = foo(target)
assert is_contract_satisfied(result, original)
def is_contract_satisfied(result: Any,
original: Iterable[Iterable[Any]]) -> bool:
...
例如foo
可能是简单的身份功能
def foo(nested_iterable: Iterable[Iterable[Any]]) -> Iterable[Iterable[Any]]:
return nested_iterable
和合同只是检查扁平化可迭代对象是否具有相同元素
from itertools import (chain,
starmap,
zip_longest)
from operator import eq
...
flatten = chain.from_iterable
def is_contract_satisfied(result: Iterable[Iterable[Any]],
original: Iterable[Iterable[Any]]) -> bool:
return all(starmap(eq,
zip_longest(flatten(result), flatten(original),
# we're assuming that ``object()``
# will create some unique object
# not presented in any of arguments
fillvalue=object())))
但是,如果nested_iterable
中的某些元素是迭代器,则由于tee
正在制作浅表副本而不是深表副本(即对于给定的foo
和is_contract_satisfied
下一条语句
>>> test_foo([iter(range(10))])
导致可预测
Traceback (most recent call last):
...
test_foo([iter(range(10))])
File "...", line 19, in test_foo
assert is_contract_satisfied(result, original)
AssertionError
如何深度复制任意嵌套的可迭代对象?
我知道copy.deepcopy
function,但是它不适用于文件对象。
答案 0 :(得分:2)
直接算法为
n
个元素逐个副本。可以像
那样实现from itertools import tee
from operator import itemgetter
from typing import (Any,
Iterable,
Tuple,
TypeVar)
Domain = TypeVar('Domain')
def copy_nested_iterable(nested_iterable: Iterable[Iterable[Domain]],
*,
count: int = 2
) -> Tuple[Iterable[Iterable[Domain]], ...]:
def shallow_copy(iterable: Iterable[Domain]) -> Tuple[Iterable[Domain], ...]:
return tee(iterable, count)
copies = shallow_copy(map(shallow_copy, nested_iterable))
return tuple(map(itemgetter(index), iterables)
for index, iterables in enumerate(copies))
优点:
缺点:
我们可以做得更好。
如果我们查看itertools.tee
function documentation,它包含Python配方,该配方可以借助functools.singledispatch
decorator进行重写
from collections import (abc,
deque)
from functools import singledispatch
from itertools import repeat
from typing import (Iterable,
Tuple,
TypeVar)
Domain = TypeVar('Domain')
@functools.singledispatch
def copy(object_: Domain,
*,
count: int) -> Iterable[Domain]:
raise TypeError('Unsupported object type: {type}.'
.format(type=type(object_)))
# handle general case
@copy.register(object)
# immutable strings represent a special kind of iterables
# that can be copied by simply repeating
@copy.register(bytes)
@copy.register(str)
# mappings cannot be copied as other iterables
# since they are iterable only by key
@copy.register(abc.Mapping)
def copy_object(object_: Domain,
*,
count: int) -> Iterable[Domain]:
return itertools.repeat(object_, count)
@copy.register(abc.Iterable)
def copy_iterable(object_: Iterable[Domain],
*,
count: int = 2) -> Tuple[Iterable[Domain], ...]:
iterator = iter(object_)
# we are using `itertools.repeat` instead of `range` here
# due to efficiency of the former
# more info at
# https://stackoverflow.com/questions/9059173/what-is-the-purpose-in-pythons-itertools-repeat/9098860#9098860
queues = [deque() for _ in repeat(None, count)]
def replica(queue: deque) -> Iterable[Domain]:
while True:
if not queue:
try:
element = next(iterator)
except StopIteration:
return
element_copies = copy(element,
count=count)
for sub_queue, element_copy in zip(queues, element_copies):
sub_queue.append(element_copy)
yield queue.popleft()
return tuple(replica(queue) for queue in queues)
优点:
缺点:
O(1)
复杂性的字典查找)。让我们如下定义嵌套的可迭代对象
nested_iterable = [range(10 ** index) for index in range(1, 7)]
由于创建迭代器并没有说明基础副本的性能,因此我们为迭代器耗尽定义函数(描述here)
exhaust_iterable = deque(maxlen=0).extend
使用timeit
软件包
import timeit
def naive(): exhaust_iterable(copy_nested_iterable(nested_iterable))
def improved(): exhaust_iterable(copy_iterable(nested_iterable))
print('naive approach:', min(timeit.repeat(naive)))
print('improved approach:', min(timeit.repeat(improved)))
我的笔记本电脑上装有Windows 10 x64(Python 3.5.4版)
naive approach: 5.1863865
improved approach: 3.5602296000000013
Line # Mem usage Increment Line Contents
================================================
78 17.2 MiB 17.2 MiB @profile
79 def profile_memory(nested_iterable: Iterable[Iterable[Any]]) -> None:
80 68.6 MiB 51.4 MiB result = list(flatten(flatten(copy_nested_iterable(nested_iterable))))
“天真的”方法和
Line # Mem usage Increment Line Contents
================================================
78 17.2 MiB 17.2 MiB @profile
79 def profile_memory(nested_iterable: Iterable[Iterable[Any]]) -> None:
80 68.7 MiB 51.4 MiB result = list(flatten(flatten(copy_iterable(nested_iterable))))
“改进”的。
注意:我进行了不同的脚本运行,因为一次使它们无法代表,因为第二条语句将重用以前创建的幕后int
对象。 / p>
我们可以看到这两个函数具有相似的性能,但是最后一个函数支持更深层次的嵌套,并且看起来很可扩展。
我从0.4.0
版本开始向lz
package添加了“改进”的解决方案,该解决方案可以像
>>> from lz.replication import replicate
>>> iterable = iter(range(5))
>>> list(map(list, replicate(iterable,
count=3)))
[[0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4]]
它已使用hypothesis
framework经过了基于属性的测试,因此我们可以确定它可以按预期工作。
答案 1 :(得分:0)
解决您的问题:如何深度复制嵌套的可迭代对象?
您可以使用标准库中的deepcopy
:
>>> from copy import deepcopy
>>>
>>> ni = [1, [2,3,4]]
>>> ci = deepcopy(ni)
>>> ci[1][0] = "Modified"
>>> ci
[1, ['Modified', 3, 4]]
>>> ni
[1, [2,3,4]]
@Azat Ibrakov说:您正在处理序列,例如尝试深复制文件对象(提示:它将失败)
否,对文件对象进行深度复制不会失败,您可以对文件对象进行深度复制,演示:
import copy
with open('example.txt', 'w') as f:
f.writelines(["{}\n".format(i) for i in range(100)])
with open('example.txt', 'r') as f:
l = [1, [f]]
c = copy.deepcopy(l)
print(isinstance(c[1][0], file)) # Prints True.
print("\n".join(dir(c[1][0])))
打印:
True
__class__
__delattr__
__doc__
__enter__
__exit__
__format__
__getattribute__
...
write
writelines
xreadlines
根据Python迭代器协议,某些容器包含的项目是通过执行next
函数(请参见docs here来获得的。)
在遍历整个迭代器(执行next()
直到引发StopIteration异常之前),您不会拥有实现迭代器协议的对象的所有项(作为文件对象)。
这是因为您无法确定执行迭代器的next
(对于Python 2.x是__next__
方法)的结果
请参见以下示例:
import random
class RandomNumberIterator:
def __init__(self):
self.count = 0
self.internal_it = range(10) # For later demostration on deepcopy
def __iter__(self):
return self
def next(self):
self.count += 1
if self.count == 10:
raise StopIteration
return random.randint(0, 1000)
ri = RandomNumberIterator()
for i in ri:
print(i) # This will print randor numbers each time.
# Can you come out with some sort of mechanism to be able
# to copy **THE CONTENT** of the `ri` iterator?
再次可以:
from copy import deepcopy
cri = deepcopy(ri)
for i in cri.internal_it:
print(i) # Will print numbers 0..9
# Deepcopy on ri successful!
在这里,文件对象是一种特殊情况,其中涉及到文件处理程序,之前,您看到可以对文件对象进行深拷贝,但是它会处于
closed
状态。
您可以在可迭代项上调用list
,它将自动评估可迭代项,然后您就可以再次进行测试可迭代的内容。
返回文件:
with open('example.txt', 'w') as f:
f.writelines(["{}\n".format(i) for i in range(5)])
with open('example.txt', 'r') as f:
print(list(f)) # Prints ['0\n', '1\n', '2\n', '3\n', '4\n']
您可以对嵌套的可迭代对象进行深度复制,但是,当它们被复制时,您无法评估可迭代对象,这毫无意义(请记住RandomNumberIterator
)。
如果您需要对可迭代对象 CONTENT 进行测试,则需要对其进行评估。