在我的Python脚本中,我正在确定Azure Datalake存储Gen2中的目录大小。 在我检查更大的目录之前,代码可以正常工作。
def __init__(...):
self._title = None
@property
def title(self):
return self._title
首先查看OOM(内存不足)问题并添加
let user2 = {
name: "Akash",
surName: "Jangra",
age: 30,
Salary: 25000,
[Symbol.toPrimitive](hint) {
console.log(`hint : ${hint}`);
return hint == "string" ? `{name: "${this.name}"}, ${this.surName}` : this.Salary;
}
};
console.log(user2);
console.log(+user2);
console.log(user2 + 500);
。
现在,另一个错误-
import sys
from dbutils import FileInfo
from typing import List
sys.setrecursionlimit(2000)
root_path = "/mnt/datalake/.../"
def discover_size(path: str, verbose: bool = True):
def loop_path(paths: List[FileInfo], accum_size: float):
if not paths:
return accum_size
else:
head, tail = paths[0], paths[1:]
if head.size > 0:
if verbose:
accum_size += head.size / 1e6
return loop_path(tail, accum_size)
else:
extended_tail = dbutils.fs.ls(head.path) + tail
return loop_path(extended_tail, accum_size)
return loop_path(dbutils.fs.ls(path), 0.0)
discover_size(root_path, verbose=True)
如何克服这个问题。
答案 0 :(得分:0)
dbutils.fs.ls()
的文档还远远不够完善,我手头没有DataBricks环境,但是在不使用实际递归的情况下,类似这样的方法可能会更好地工作,但是需要访问的路径列表。 / p>
import dbutils
def discover_size(path: str) -> int:
total_size = 0
visited = set()
to_visit = [path]
while to_visit:
path = to_visit.pop(0)
if path in visited:
print("Already visited %s..." % path)
continue
visited.add(path)
print(
f"Visiting %s, size %s so far..." % (path, total_size),
)
for info in dbutils.fs.ls(path):
total_size += info.size
if info.isDir():
to_visit.add(info.path)
return total_size
discover_size("/mnt/datalake/.../", verbose=True)