Python-规范化/标准化字典值

时间:2018-06-26 10:14:11

标签: python dictionary networkx

我有三个字典,它们的键是相同的,但是这些字典的大小却相差很大。我想对这些词典的值进行归一化/标准化,因此我可以将它们相加在一起以为每个键创建一个整体组合得分(三个不同输入的权重相等)。

Current:
page_score = {'andrew.lewis: 6.599', 'jack.redmond: 4.28'.....'geoff.storey: 2.345)
eigen_score = {'andrew.lewis: 4.97', 'jack.redmond: 2.28'.....'geoff.storey: 3.927)
(1 more)


Normalized:
page_score = {'andrew.lewis: 0.672', 'jack.redmond: 0.437'.....'geoff.storey: 0.276)
hub_score = {'andrew.lewis: 0.432', 'jack.redmond: 0.762'.....'geoff.storey: 0.117)
(1 more)

End Output:
overall_score = {'andrew.lewis: 2.738.....) """combination of values across the three standardized dictionaries"""

我该如何实现?我知道如何为列表执行此操作,但是我不确定如何为字典实现此操作。我已经尝试了herehere提供的解决方案,但是奇怪的是它们创建了各种错误。任何帮助,将不胜感激。到目前为止的代码:

G = nx.read_weighted_edgelist('Only_50_Employees1.csv', delimiter=',', create_using = nx.DiGraph(), nodetype=str)

between_score = dict(nx.betweenness_centrality(G))
eigen_score = dict(nx.eigenvector_centrality(G))
page_score = nx.pagerank(G)

已经尝试

factor=1.0/sum(page_score.values())
normalised_d = {k: v*factor for k, v in page_score.items()}

def normalize(page_score, target=1.0):
raw = sum(page_score.values())
factor = target/raw
return {key:value*factor for key,value in page_score.items()}

def really_safe_normalise_in_place(page_score):
factor=1.0/math.fsum(page_score.values())
for k in page_score:
    page_score[k] = page_score[k]*factor
key_for_max = max(page_score.tems(), key=operator.itemgetter(1))[0]
diff = 1.0 - math.fsum(page_score.values())
#print "discrepancy = " + str(diff)
page_score[key_for_max] += diff
d={v: v+1.0/v for v in xrange(1, 1000001)}
really_safe_normalise_in_place(d)
print math.fsum(page_score.values())

page_score词典的屏幕截图: enter image description here

1 个答案:

答案 0 :(得分:1)

对于任何感兴趣的人,我发现了一种非常神奇的方式来通过使用数据框来实现这一目标:

# Libraries
import networkx as nx
import pandas as pd
import operator

# Loading files and node metrics
G = nx.read_weighted_edgelist('Only_50_Employees1.csv', delimiter=',', create_using = nx.DiGraph(), nodetype=str)
page_score = dict(nx.pagerank(G))
eigen_score = dict(nx.eigenvector_centrality(G))
betweenness_score = dict(nx.betweenness_centrality(G))
mydicts = [page_score, betweenness_score, eigen_score]

# Creating pandas dataframe
df = pd.concat([pd.Series(d) for d in mydicts], axis=1).fillna(0).T
df.index = ['page_score', 'betweenness_score', 'eigen_score']
df = df.transpose()
del page_score, eigen_score, betweenness_score, mydicts

# Scaling (and making values positive)
df = (df - df.mean()) / (df.max() - df.min())
minus_columns = ['page_score', 'betweenness_score', 'eigen_score']
df = df[minus_columns] + 1

# Creating new column with overall score
df['score'] = df['page_score'] + df['betweenness_score'] + df['eigen_score']
del df['page_score'], df['betweenness_score'], df['eigen_score']

# Reverting df back to dict
score_dict = df['score'].to_dict()