基于其他列的长度对列中的值进行索引列表

时间:2020-05-13 18:59:37

标签: python pandas list indexing

我有一个如下所示的DataFrame:

len  scores
5      [0.45814112124905954, 0.34974337172257086, 0.042586941883761324, 0.042586941883761324, 0.33509446692807404, 0.01202741856859997, 0.01202741856859997, 0.031149023579740857, 0.031149023579740857, 0.9382029832667171]
4      [0.1289882974831455, 0.17069367229950574, 0.03518847270370917, 0.3283517918439753, 0.41119171582425107, 0.5057528742869354]

3      [0.22345885572316307, 0.1366147609256035, 0.09309687010700848]
2      [0.4049920770888036]

我想根据len列的值对分数列进行索引并获取多行

len    scores
5       [0.45814112124905954, 0.34974337172257086, 0.042586941883761324, 0.042586941883761324]
5       [0.33509446692807404, 0.01202741856859997, 0.01202741856859997]
5       [0.031149023579740857, 0.031149023579740857]
5       [0.9382029832667171]
5       
4       [0.1289882974831455, 0.17069367229950574, 0.03518847270370917]
4       [0.3283517918439753, 0.41119171582425107]
4       [0.9382029832667171]
4
3       [0.22345885572316307, 0.1366147609256035]
3       [0.09309687010700848]
3
2       [0.4049920770888036]
2

我尝试过

d = []
for x in df['len']:
    col = df['scores'][:(x-1)]
    d.append(col)

但这只会给我第一行索引行

len  scores
5      [0.45814112124905954, 0.34974337172257086, 0.042586941883761324, 0.042586941883761324]
4      [0.1289882974831455, 0.17069367229950574, 0.03518847270370917]
3      [0.22345885572316307, 0.1366147609256035]
2      [0.4049920770888036]

如何根据我的要求将其余的行编入索引?

2 个答案:

答案 0 :(得分:2)

像您的示例一样,假设列len与列分数行中的列表长度相关,您可以使用apply来将列表重塑为长度减小的嵌套列表,然后{ {1}}喜欢:

explode

编辑:如果不能使用爆炸,请尝试如下操作:

#define function to create nested list
def create_nested_list (x):
    l_idx = [0]+np.cumsum(np.arange(x['len'])[::-1]).tolist()
    return [x['scores'][i:j] for i, j in zip(l_idx[:-1], l_idx[1:])]

#apply row-wise
s = df.apply(create_nested_list, axis=1)
#change index to keep the value in len
s.index=df['len']
#explode and reset_index
df_f = s.explode().reset_index(name='scores')

print (df_f)
    len                                             scores
0     5  [0.45814112124905954, 0.34974337172257086, 0.0...
1     5  [0.33509446692807404, 0.01202741856859997, 0.0...
2     5       [0.031149023579740857, 0.031149023579740857]
3     5                               [0.9382029832667171]
4     5                                                 []
5     4  [0.1289882974831455, 0.17069367229950574, 0.03...
6     4          [0.3283517918439753, 0.41119171582425107]
7     4                               [0.5057528742869354]
8     4                                                 []
9     3          [0.22345885572316307, 0.1366147609256035]
10    3                              [0.09309687010700848]
11    3                                                 []
12    2                               [0.4049920770888036]
13    2                                                 []

答案 1 :(得分:0)

df.explode()完全可以满足您的需求。

示例:

import pandas as pd

df = pd.DataFrame({'A': [[1, 2, 3], 'foo', [], [3, 4]], 'B': 1})
df.explode('A')
#Output
#      A  B
# 0    1  1
# 0    2  1
# 0    3  1
# 1  foo  1
# 2  NaN  1
# 3    3  1
# 3    4  1