Question

我有这个DataFrame：

          value
L1 L2 L3       
11 21 31      1
      32      2
      34      3
   23 31      4
      33      5
      34      6
12 21 32      7

在此DataFrame中，(L1, L2)是ID的元组，L3是周数。我想在我的DataFrame中添加一些行，以便为具有默认值的所有元组提供所有可能的周数：

          value
L1 L2 L3       
11 21 31      1
      32      2
      33      0
      34      3
   23 31      4
      32      0
      33      5
      34      6
12 21 31      0
      32      7
      33      0
      34      0

为了获取此DataFrame，我获取了唯一元组(L1,L2)的列表以及L3的所有值列表，以创建新的MultiIndex并重新索引我的DataFrame：

# Get all tuples (L1,L2)
l12_set = set(df.index.droplevel(2).tolist())

# Get all L3
l3_set = set(df.index.droplevel([0,1]).tolist())

index_array_l1 = np.array([], int)
index_array_l2 = np.array([], int)
index_array_l3 = np.array([], int)

# Creation of the index
for l1, l2 in l12_set:
    for l3 in l3_set:
        index_array_l1 = np.append(index_array_l1, l1)
        index_array_l2 = np.append(index_array_l2, l2)
        index_array_l3 = np.append(index_array_l3, l3)

index_array = np.array([index_array_l1, index_array_l2, index_array_l3])
multi_index = pd.MultiIndex.from_arrays(index_array, names=['L1', 'L2', 'L3'])

df = df.reindex(multi_index, fill_value=0)

问题是这个方法很长很大的DataFrame（数百万行）。我想知道是否已经在pandas库中实现了快速方法（或者如果有更快的方法）。

Answer 1

使用unstack和stack

df.unstack().stack(dropna=False).fillna(0).astype(int)
Out[433]: 
          value
L1 L2 L3       
11 21 31      1
      32      2
      33      0
      34      3
   23 31      4
      32      0
      33      5
      34      6
12 21 31      0
      32      7
      33      0
      34      0

Answer 2

u = pd.unique([t[:2] for t in df.index.values])
l2 = df.index.levels[2]
df.reindex([t + (i,) for t in u for i in l2], fill_value=0)

          value
L1 L2 L3       
11 21 31      1
      32      2
      33      0
      34      3
   23 31      4
      32      0
      33      5
      34      6
12 21 31      0
      32      7
      33      0
      34      0

如何有效地重新索引DataFrame以填充索引列表中的漏洞？

2 个答案: