我有一个如下所示的DataFrame:
len scores
5 [0.45814112124905954, 0.34974337172257086, 0.042586941883761324, 0.042586941883761324, 0.33509446692807404, 0.01202741856859997, 0.01202741856859997, 0.031149023579740857, 0.031149023579740857, 0.9382029832667171]
4 [0.1289882974831455, 0.17069367229950574, 0.03518847270370917, 0.3283517918439753, 0.41119171582425107, 0.5057528742869354]
3 [0.22345885572316307, 0.1366147609256035, 0.09309687010700848]
2 [0.4049920770888036]
我想根据len列的值对分数列进行索引并获取多行
len scores
5 [0.45814112124905954, 0.34974337172257086, 0.042586941883761324, 0.042586941883761324]
5 [0.33509446692807404, 0.01202741856859997, 0.01202741856859997]
5 [0.031149023579740857, 0.031149023579740857]
5 [0.9382029832667171]
5
4 [0.1289882974831455, 0.17069367229950574, 0.03518847270370917]
4 [0.3283517918439753, 0.41119171582425107]
4 [0.9382029832667171]
4
3 [0.22345885572316307, 0.1366147609256035]
3 [0.09309687010700848]
3
2 [0.4049920770888036]
2
我尝试过
d = []
for x in df['len']:
col = df['scores'][:(x-1)]
d.append(col)
但这只会给我第一行索引行
len scores
5 [0.45814112124905954, 0.34974337172257086, 0.042586941883761324, 0.042586941883761324]
4 [0.1289882974831455, 0.17069367229950574, 0.03518847270370917]
3 [0.22345885572316307, 0.1366147609256035]
2 [0.4049920770888036]
如何根据我的要求将其余的行编入索引?
答案 0 :(得分:2)
像您的示例一样,假设列len与列分数行中的列表长度相关,您可以使用apply
来将列表重塑为长度减小的嵌套列表,然后{ {1}}喜欢:
explode
编辑:如果不能使用爆炸,请尝试如下操作:
#define function to create nested list
def create_nested_list (x):
l_idx = [0]+np.cumsum(np.arange(x['len'])[::-1]).tolist()
return [x['scores'][i:j] for i, j in zip(l_idx[:-1], l_idx[1:])]
#apply row-wise
s = df.apply(create_nested_list, axis=1)
#change index to keep the value in len
s.index=df['len']
#explode and reset_index
df_f = s.explode().reset_index(name='scores')
print (df_f)
len scores
0 5 [0.45814112124905954, 0.34974337172257086, 0.0...
1 5 [0.33509446692807404, 0.01202741856859997, 0.0...
2 5 [0.031149023579740857, 0.031149023579740857]
3 5 [0.9382029832667171]
4 5 []
5 4 [0.1289882974831455, 0.17069367229950574, 0.03...
6 4 [0.3283517918439753, 0.41119171582425107]
7 4 [0.5057528742869354]
8 4 []
9 3 [0.22345885572316307, 0.1366147609256035]
10 3 [0.09309687010700848]
11 3 []
12 2 [0.4049920770888036]
13 2 []
答案 1 :(得分:0)
df.explode()
完全可以满足您的需求。
示例:
import pandas as pd
df = pd.DataFrame({'A': [[1, 2, 3], 'foo', [], [3, 4]], 'B': 1})
df.explode('A')
#Output
# A B
# 0 1 1
# 0 2 1
# 0 3 1
# 1 foo 1
# 2 NaN 1
# 3 3 1
# 3 4 1