Question

我正在尝试对包含路径库信息和特定驱动器文件大小的熊猫数据结构使用groupby。我想汇总文件目录结构特定深度处使用的存储，以查看哪些目录最完整。

我试图对每个文件的Pathlib父值进行求和分组，但是仍然不能告诉您特定深度的总存储量。 Pathlib“父母”看起来很有希望，但它从完整路径开始并向后工作，因此我尝试了反向索引，但似乎不起作用。

从我在Pathlib文档中阅读的内容来看，“父母”应该是序列，应该支持反向索引，但是错误消息似乎暗示他们不做负数。

这是我一直在使用的代码（在http://pbpython.com/pathlib-intro.html的帮助下）

import pandas as pd
from pathlib import Path
import time

dir_to_scan = "c:/Program Files"
p = Path(dir_to_scan)

all_files = []
for i in p.rglob('*.*'):
    all_files.append((i.name, i.parent,i.stat().st_size))

columns = ["File_Name", "Parent", "Size"]
df = pd.DataFrame.from_records(all_files, columns=columns)

df["path_stem"]=df['Parent'].apply(lambda x: x.parent if len(x.parents)<3 else x.parents[-2] )

错误跟踪如下：

IndexError                                Traceback (most recent call last)
<ipython-input-3-5748b1f0a9ee> in <module>()
      1 #df.groupby('Parent')['Size'].sum()
      2 
----> 3 df["path_stem"]=df['Parent'].apply(lambda x: x.parent if len(x.parents)<3 else x.parents[-1] )
      4 
      5 #df([apps])=df([Parent]).parents

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\series.py in apply(self, func, convert_dtype, args, **kwds)
   2549             else:
   2550                 values = self.asobject
-> 2551                 mapped = lib.map_infer(values, f, convert=convert_dtype)
   2552 
   2553         if len(mapped) and isinstance(mapped[0], Series):

pandas/_libs/src/inference.pyx in pandas._libs.lib.map_infer()

<ipython-input-3-5748b1f0a9ee> in <lambda>(x)
      1 #df.groupby('Parent')['Size'].sum()
      2 
----> 3 df["path_stem"]=df['Parent'].apply(lambda x: x.parent if len(x.parents)<3 else x.parents[-1] )
      4 
      5 #df([apps])=df([Parent]).parents

C:\ProgramData\Anaconda3\lib\pathlib.py in __getitem__(self, idx)
    592     def __getitem__(self, idx):
    593         if idx < 0 or idx >= len(self):
--> 594             raise IndexError(idx)
    595         return self._pathcls._from_parsed_parts(self._drv, self._root,
    596                                                 self._parts[:-idx - 1])

IndexError: -1

通过Pathlib Parents方法使用负索引

0 个答案: