我有两个列表(实际上是两个数据帧列)。它们具有相同的元素,但是一个列表是无序的。我想获得与有序列表对应的无序列表的索引。是否有捷径可寻?
即。 list1 [indices] == list2
我需要获取索引变量。
答案 0 :(得分:1)
在列表理解中使用list.index()
:
l1 = ['a','b','c','d']
l2 = ['c','d','b','a']
[l1.index(x) for x in l2] #[2, 3, 1, 0]
如果您尝试在DataFrame
中执行此操作,则可以从np.array
转换为list
并返回,如:
import numpy as np
import pandas as pd
df = pd.DataFrame({'v1':np.array(l1), 'v2':np.array(l2)})
df['index_of_v2_in_v1'] = np.array([list(df['v1']).index(x) for x in list(df['v2'])])
df
# Result:
# v1 v2 index_of_v2_in_v1
# 0 a c 2
# 1 b d 3
# 2 c b 1
# 3 d a 0
如果您<100>确定已经对列表1进行了排序(如您的问题所示),您可以在列表或数组中使用np.argsort(l2)
,如下所示:
np.argsort(df['v2'])
# Returns:
#0 3
#1 2
#2 0
#3 1
#Name: v2, dtype: int64
答案 1 :(得分:1)
在此示例中,使用map
比列表理解快约3.6倍:
from timeit import timeit
l1 = ['a','b','c','d']
l2 = ['c','d','b','a']
t1 = timeit('map(lambda e: l1.index(e), l2)', globals=globals())
t2 = timeit('[l1.index(x) for x in l2]', globals=globals())
print("t1 = %s, t2 = %s, t2/t1 = %s" % (t1, t2, t2/t1))
结果:
t1 = 0.32407195774213654, t2 = 1.162188749526786, t2/t1 = 3.586205846454439
编辑:其他比较,包括@jbch提出的解决方案:
from timeit import timeit
from random import shuffle
for n in range(10, 70, 10):
l1 = list(range(n))
l2 = l1[:]
shuffle(l2)
t1 = timeit('indices = {val: i for i, val in enumerate(l1)}; [indices[x] for x in l2]', globals=globals())
t2 = timeit('[l1.index(x) for x in l2]', globals=globals())
t3 = timeit('map(lambda e: l1.index(e), l2)', globals=globals())
print("n = %d, t1 = %g, t2 = %g, t3 = %g" % (n, t1, t2, t3))
结果:
n = 10, t1 = 3.25064, t2 = 3.70473, t3 = 0.339757
n = 20, t1 = 5.01145, t2 = 9.22295, t3 = 0.341116
n = 30, t1 = 7.18546, t2 = 16.6379, t3 = 0.344537
n = 40, t1 = 8.96271, t2 = 26.0522, t3 = 0.336952
n = 50, t1 = 11.0635, t2 = 37.7291, t3 = 0.341935
n = 60, t1 = 12.6453, t2 = 51.1519, t3 = 0.350777
答案 2 :(得分:0)
C8H10N42的答案时间复杂度为O(n ^ 2),在大名单上需要很长时间。每次对index()的调用都是O(n),它被调用n次。
如果您需要更好的性能,可以使用此O(n)解决方案:
from timeit import timeit
from random import shuffle
for n in range(0, 50, 5):
l1 = list(range(n))
l2 = l1[:]
shuffle(l2)
t1 = timeit('indices = {val: i for i, val in enumerate(l1)}; [indices[x] for x in l2]', 'from __main__ import l1, l2')
t2 = timeit('[l1.index(x) for x in l2]', 'from __main__ import l1, l2')
print("n = %s, t1 = %s, t2 = %s, t2/t1 = %s" % (n, t1, t2, t2/t1))
创建字典是O(n),然后您可以用O(1)dict访问替换O(n)index()调用。所以复杂度是O(n)+ O(n)而不是O(n ^ 2)。
如果你尝试使用不同的列表大小,你会看到列表越大,index()表现越差:
n = 0, t1 = 0.410041093826, t2 = 0.0470049381256, t2/t1 = 0.114634700847
n = 5, t1 = 1.01210093498, t2 = 0.980098009109, t2/t1 = 0.96837970921
n = 10, t1 = 1.70017004013, t2 = 2.06220698357, t2/t1 = 1.21294160872
n = 15, t1 = 2.12121200562, t2 = 3.28132796288, t2/t1 = 1.54691183823
n = 20, t1 = 2.64426398277, t2 = 4.81948184967, t2/t1 = 1.82261751515
n = 25, t1 = 3.42534303665, t2 = 6.57365703583, t2/t1 = 1.9191237098
n = 30, t1 = 3.95739603043, t2 = 8.52685213089, t2/t1 = 2.15466232475
n = 35, t1 = 4.24842405319, t2 = 10.8080809116, t2/t1 = 2.54402121265
n = 40, t1 = 4.75647592545, t2 = 13.3403339386, t2/t1 = 2.80466760427
n = 45, t1 = 5.33353281021, t2 = 15.6205620766, t2/t1 = 2.92874584865
结果:
npm config set proxy http://proxy.company.com:proxyport