Question

我想知道处理存储/阅读项目列表的正确方法是什么，例如以下处理rockstar的示例，其中已知列表将最大数量的值保存到hdf5：

Date_of_Birth
Bands[] - where the maximum number of bands is 10
Siblings[] - where the maximum number of siblings is 6
Date_of_Death

所有这些都是列名。

我考虑过的一种方法，但结果是错误（ValueError: cannot reindex from a duplicate axis）是有重复的列名。否则，我能做的就是Bands 1，Bands 2等...但这会使检索和查询变得麻烦。有没有更好的办法？非常感谢任何帮助！

Answer 1

对于这样的事情，你实际上想列出每个乐队和兄弟姐妹的栏目，我会尝试使用多索引

假设您有一个数据帧，您可以通过调用这些列来调用df df.columns吐出类似Int64Index([dob, band_1, band_2], dtype='int64')的内容。您可以将索引重建为可以通过执行此操作立即获取所有波段的内容...

编辑找到了一种方法来做'部分'MultiIndex

df.columns = pd.MultiIndex.from_tuples([('dob',''),('bands','band_1'),('bands','band_2')])

还有一个构建元组列表的提示 - 您可以在现有列上添加一堆列表推导....

 [('band',each) for each in df.columns[df.columns>1].apply(lambda x: re.search("band",x)]
 #etc