如何从列表列
创建新列表列我的数据框:
id x list_id
1 20 [2, 4]
2 10 [1, 3]
3 10 [1]
4 30 [1, 2]
我想要的是什么:
id x list_id list_x
1 20 [2, 4] [10, 30]
2 10 [1, 3] [20, 10]
3 10 [1] [20]
4 30 [1, 2] [20, 10]
我的第一个想法是迭代每一行,然后检查id是否在列表中
for index, row in df.iterrows():
if ( df['id'].isin(row['list_id']) ):
do_somthing
但它不起作用,任何建议!!
答案 0 :(得分:4)
使用列表理解:
df.loc[:,'list_x'] = [df.x[df['id'].isin(l)].values for l in df.list_id]
虚拟数据的完整示例:
import pandas as pd
data= {
'id': [1,2,3,4],
'x': [20,10,10,30],
'list_id': [[2,4],[1,3],[1],[1,2]],
}
df = pd.DataFrame(data)
df.loc[:,'list_x'] = [df.x[df['id'].isin(l)].values for l in df.list_id]
输出
print df
list_id x list_x
1 [2, 4] 20 [10, 30]
2 [1, 3] 10 [20, 10]
3 [1] 10 [20]
4 [1, 2] 30 [20, 10]
答案 1 :(得分:0)
广告素材解决方案
使用带有numpy
元素的set
对象数组
i = np.array([set([x]) for x in df.id.values.tolist()])
x = np.empty(i.shape, dtype=object)
x[:] = [[x] for x in df.x.values.tolist()]
y = np.empty_like(x)
y.fill([])
j = np.array([set(x) for x in df.list_id.values.tolist()])
df.assign(list_x=np.where(i <= j[:, None], x, y).sum(1))
id x list_id list_x
0 1 20 [2, 4] [10, 30]
1 2 10 [1, 3] [20, 10]
2 3 10 [1] [20]
3 4 30 [1, 2] [20, 10]
计时
%timeit df.assign(list_x=[df.x[df['id'].isin(l)].values for l in df.list_id])
1000 loops, best of 3: 1.21 ms per loop
%%timeit
i = np.array([set([x]) for x in df.id.values.tolist()])
x = np.empty(i.shape, dtype=object)
x[:] = [[x] for x in df.x.values.tolist()]
y = np.empty_like(x)
y.fill([])
j = np.array([set(x) for x in df.list_id.values.tolist()])
df.assign(list_x=np.where(i <= j[:, None], x, y).sum(1))
1000 loops, best of 3: 371 µs per loop