我有两个词典列表,并且想找出哪个状态对两个列表的每个元素具有最大的差异。两个列表的长度相同。
list1 = [{'NY':40, 'NJ':30, 'FL':30}, {'NY':40, 'NJ':50, 'FL':10}]
list2 = [{'NY':50, 'NJ':45, 'CT':20}, {'NY':40, 'FL':30}]
对于list1[0]
和list2[0]
,由于FL
= 30,FL
= 10,NY
,NJ
在两者之间具有最大差异= 15,并且CT
=20。对于list1[1]
和list2[1]
,NJ
具有最大差异。
如何在下面获得所需的输出?谢谢。
State Diff
FL 30
NJ 50
答案 0 :(得分:4)
我们要比较两个DataFrame的相应行。首先,让我们align
:
df1 = pd.DataFrame(list1)
df2 = pd.DataFrame(list2)
df1, df2 = df1.fillna(0).align(df2.fillna(0), fill_value=0)
df1
CT FL NJ NY
0 0 30 30 40
1 0 10 50 40
df2
CT FL NJ NY
0 20.0 0.0 45.0 50
1 0.0 30.0 0.0 40
现在,您可以使用idmax
查找差异最大的值,调用lookup
获得diff值并创建一个新的DataFrame。
u = (df1 - df2).abs()
idx = u.idxmax(1)
pd.DataFrame({'State': idx, 'Diff': u.lookup(u.index, u.idxmax(1))})
State Diff
0 FL 30.0
1 NJ 50.0
答案 1 :(得分:3)
使用zip
在两个列表上进行简单迭代,并跟踪每次迭代的最大值
res=[]
for l1,l2 in zip(list1,list2):
max_diff = tuple((0,0))
for key in set(list(l1.keys()) + list(l2.keys())):
diff = abs(l1.get(key,0) - l2.get(key,0))
if diff > max_diff[1]:
max_diff = tuple((key,diff))
res.append((max_diff))
输出:
[('FL', 30), ('NJ', 50)]
答案 2 :(得分:2)
我可能会使用pandas
方法,但是您也可以在此处使用列表理解:
import numpy as np
from operator import itemgetter
max_diffs = [
max(
[
(k, np.abs(a.get(k, 0) - b.get(k, 0)))
for k in set(list(a.keys()) + list(b.keys()))
],
key=itemgetter(1)
)
for a, b in zip(list1, list2)
]
print(max_diffs)
#[('FL', 30), ('NJ', 50)]
如果要在DataFrame中输出,则可以执行以下操作:
import pandas as pd
df = pd.DataFrame(max_diffs, columns=["State", "Diff"])
print(df)
# State Diff
#0 FL 30
#1 NJ 50