Question

我有一个像这样的列表：

categories_list = [
    ['a', array([ 12994, 1262824, 145854,  92469]),
     'b', array([273300]),
     'c', array([341395, 32857711])],
    ['a', array([ 356424311,  165573412, 2032850784]),
     'b', array([2848105, 228835]),
     'c', array([])],
    ['a', array([1431689, 30655043, 1739919]),
     'b', array([597, 251911, 246600]),
     'c', array([35590])]
]

其中每个数组都属于之前的字母。示例：a -> array([ 12994, 1262824, 145854, 92469]), b -> array([273300]), 'a' -> array([1431689, 30655043, 1739919]) and so on...

那么，是否可以检索每个字母的总项目编号？必要条件：

----------
a      10
b       6
c       3

欢迎所有建议

Answer 1

pd.DataFrame(
    [dict(zip(x[::2], [len(y) for y in x[1::2]])) for x in categories_list]
).sum()

a    10
b     6
c     3
dtype: int64

我的目标是创建一个词典列表。所以我必须用......填写用字典解析每个子列表的东西
```
[ ...... for x in catgories_list]
```
如果我在dict的列表或生成器上使用tuples，它会神奇地将其转换为字典，其中键作为元组中的第一个值，值作为第二个值元组。
```
dict(...list of tuples...)
```
zip会为我提供tuples
的生成器
```
zip(list one, list two)
```
我知道在每个子列表中，我的键位于偶数索引[0, 2, 4...]，值是奇数索引[1, 3, 5, ...]
```
#   even    odd
zip(x[::2], x[1::2])
```

但x[1::2]将是数组，我不想要数组。我想要数组的长度。

#   even                     odd
zip(x[::2], [len(y) for y in x[1::2]])

pandas.DataFrame将获取字典列表并创建数据框。
最后，使用sum计算长度。

Answer 2

我使用with open(file) as f: newline_positions = [-1] for v in f: newline_positions.append(newline_positions[-1]+len(v)) print(newline_positions[1:])来分组groupby列（分别包含密钥0, 2, 4，a，b），然后计算数量下一栏中的不同项目编号。在这种情况下，组中的数字为c（如果您只想要组的总长度，则为len(set(group))。请参阅以下代码：

len(group)

输出 from itertools import groupby, chain count_distincts = [] cols = [0, 2, 4] for c in cols: for gid, group in groupby(categories_list, key=lambda x: x[c]): group = list(chain(*[list(g[c + 1]) for g in group])) count_distincts.append([gid, len(set(group))])

python pandas：子列表：总项目编号

2 个答案: