RecursionError:比较中超出了最大递归深度

时间:2020-05-19 12:12:43

标签: python python-3.x apache-spark pyspark databricks

在我的Python脚本中,我正在确定Azure Datalake存储Gen2中的目录大小。 在我检查更大的目录之前,代码可以正常工作。

def __init__(...):
   self._title = None

@property
def title(self):
    return self._title

首先查看OOM(内存不足)问题并添加

let user2 = { name: "Akash", surName: "Jangra", age: 30, Salary: 25000, [Symbol.toPrimitive](hint) { console.log(`hint : ${hint}`); return hint == "string" ? `{name: "${this.name}"}, ${this.surName}` : this.Salary; } }; console.log(user2); console.log(+user2); console.log(user2 + 500);

现在,另一个错误-

import sys from dbutils import FileInfo from typing import List sys.setrecursionlimit(2000) root_path = "/mnt/datalake/.../" def discover_size(path: str, verbose: bool = True): def loop_path(paths: List[FileInfo], accum_size: float): if not paths: return accum_size else: head, tail = paths[0], paths[1:] if head.size > 0: if verbose: accum_size += head.size / 1e6 return loop_path(tail, accum_size) else: extended_tail = dbutils.fs.ls(head.path) + tail return loop_path(extended_tail, accum_size) return loop_path(dbutils.fs.ls(path), 0.0) discover_size(root_path, verbose=True)

如何克服这个问题。

1 个答案:

答案 0 :(得分:0)

dbutils.fs.ls()的文档还远远不够完善,我手头没有DataBricks环境,但是在不使用实际递归的情况下,类似这样的方法可能会更好地工作,但是需要访问的路径列表。 / p>

import dbutils


def discover_size(path: str) -> int:
    total_size = 0
    visited = set()
    to_visit = [path]
    while to_visit:
        path = to_visit.pop(0)
        if path in visited:
            print("Already visited %s..." % path)
            continue
        visited.add(path)
        print(
            f"Visiting %s, size %s so far..." % (path, total_size),
        )
        for info in dbutils.fs.ls(path):
            total_size += info.size
            if info.isDir():
                to_visit.add(info.path)
    return total_size


discover_size("/mnt/datalake/.../", verbose=True)