我有两个时间列表。从list1中的每个点开始,我想在list2中找到最接近的后续(更长)时间。
例如:
list1 = [280,290]
list2 = [282,295]
exchange(list1,list2)= [2,5]
我很难快速做到这一点。我能想到的唯一方法是循环遍历list1中的每个元素,并使列表y中的第一个匹配大于list1元素(列表已排序)。我下面两次尝试,一只熊猫,一只不用熊猫:
# dictionary containing my two lists
transition_trj = {'ALA19': [270.0, 280.0, 320.0, 330.0, 440.0, 450.0,
470.0], 'ALA88': [275.0, 285.0, 325.0, 333.0, 445.0, 455.0, 478.0]}
# for example, exchange times for ('ALA19','ALA88') = [5.0, 5.0, 5.0, 3.0, 5.0, 5.0, 8.0]
#find all possible combinations
names = list(transition_trj.keys())
import itertools
name_pairs = list(itertools.combinations_with_replacement(names, 2))
# non-pandas loop, takes 1.59 s
def exchange(Xk,Yk): # for example, a = 'phiALA18', b = 'phiARG11'
Xv = transition_trj[Xk]
Yv = transition_trj[Yk]
pair = tuple([Xk,Yk])
XY_exchange = [] # one for each pair
for x in range(len(Yv)-1): # over all transitions in Y
ypoint = Yv[x] # y point
greater_xpoints = []
for mini in Xv:
if mini > ypoint:
greater_xpoints.append(mini) # first hit=minimum in sorted list
break
if len(greater_xpoints) > 0:
exchange = greater_xpoints[0] - ypoint
XY_exchange.append(exchange)
ET = sum(XY_exchange) * (1/observation_t)
return pair, ET
# pandas loop, does same thing, takes 11.58 s...I am new to pandas...
import pandas as pd
df = pd.DataFrame(data=transition_trj)
def exchange(dihx, dihy):
pair = tuple([dihx, dihy])
exchange_times = []
for x in range(df.__len__()):
xpoint = df.loc[x, dihx]
for y in range(df.__len__()):
ypoint = df.loc[y, dihy]
if ypoint > xpoint:
exchange = ypoint - xpoint
exchange_times.append(exchange)
break
ET = sum(exchange_times) * (1 / observation_t)
return pair, ET
# here's where I call the def, just for context.
exchange_times = {}
for nm in name_pairs:
pair, ET = exchange(nm[0],nm[1])
exchange_times[pair] = ET
if nm[0] != nm[1]:
pair2, ET2 = exchange(nm[1], nm[0])
exchange_times[pair2] = ET2
答案 0 :(得分:1)
我提出了一个解决方案np.searchsorted
(numpy是熊猫骨架),它找到了另一个列表的插入点。它是一个O(N ln (N))
解决方案,当你的O(N²)
时,因为你在每个循环中搜索开头(for mini in Xv:
)的最小值。
它适用于您的示例,但如果两个列表的长度不同或者没有交错,我就不知道您想要什么。然而,如果长度相等,则建议采用解决方案。
df=pd.DataFrame(transition_trj)
pos=np.searchsorted(df['ALA88'],df['ALA19'])
print(df['ALA88'][pos].reset_index(drop=True)-df['ALA19'])
# 0 5.0
# 1 5.0
# 2 5.0
# 3 3.0
# 4 5.0
# 5 5.0
# 6 8.0
# dtype: float64