如何遍历词典和列表以创建新词典

时间:2019-01-18 21:58:05

标签: python list dictionary

从一个项目中,我得到了一个看起来像这样的词典列表:

  

METTS MARK = {'salary':365788,'to_messages':807,'deferral_payments':'NaN','total_payments':1061827,'exercised_stock_options':'NaN','bonus':600000,'restricted_stock' :585062,'shared_receipt_with_poi':702,'restricted_stock_deferred':'NaN','total_stock_value':585062,'expenses':94299,'loan_advances':'NaN','from_messages':29,'other':1740,' from_this_person_to_poi':1,'poi':False,'director_fees':'NaN','deferred_income':'NaN','long_term_incentive':'NaN','email_address':'mark.metts@enron.com',' from_poi_to_this_person':38}

我想要做的是获取每个值,对其进行缩放,然后将“ NaN”值替换为0,然后将其返回到字典中的正确位置。

我尝试过的代码如下:

加载包含数据集的字典

with open("final_project_dataset.pkl", "r") as data_file:
    data_dict = pickle.load(data_file)

数据集中名为total的键正在创建一个明显的异常值,因此我将其删除了

del data_dict["TOTAL"]

直观地选择我的功能

my_features = [
    'poi',
    'salary',#
    'bonus',#
    'exercised_stock_options',#
    'total_stock_value',#
    'total_payments',
    'expenses',
    'loan_advances',#
    'deferral_payments',
    'deferred_income',
    'restricted_stock',#
    'restricted_stock_deferred',
    'long_term_incentive',#
    'shared_receipt_with_poi',#
    #'from_this_person_to_poi',
    #director_fees',
    #'from_messages',
    #'to_messages',
    #'from_poi_to_this_person'
]


keys = data_dict.keys()
values = data_dict.values()

用0个值替换NaN值

list_of_values = []
for key in keys:
        tmp_list = []
        for feature in my_features:
            try:
                data_dict[key][feature]
            except KeyError:
                print "error: key ", feature, " not present"
            value = data_dict[key][feature]
            if value=="NaN":
                value = 0
            tmp_list.append( float(value) )
        list_of_values.append(tmp_list)

使用最小/最大缩放器进行功能缩放

from sklearn.preprocessing import MinMaxScaler
data_array = np.array(list_of_values)
scaler = MinMaxScaler()
rescaled_data = scaler.fit_transform(data_array)

所以,现在我有了一个看起来像这样的列表列表:

  

[0。 0.32916568 0.075 0. 0.01279963 0.01025327    0.41221264 0. 0.01569801 1. 0.18366453 0.10365427    0. 0.12715088]

我想将这些重新缩放的值与相应功能一起放入字典...这是我编写的代码:

my_data_dict = []
for key in keys:
    key = {}
    for x in range( len(rescaled_data) ):
        for count in range( len(my_features) ):
            key[ my_features[count] ] = rescaled_data[x][count]        
    my_data_dict.append(key)

但是我会得到一长串具有相同值的字典。例如:

  

{'salary':0.24744478779905296,'deferral_payments':0.01569801010492397,'total_payments':0.01228550157492107,'loan_advances':0.0,'bonus':0.075,'restricted_stock_deferred':0.1036542684938879,'total_stock_value':: 379,664 0.550692201098954,'exercised_stock_options':0.011200759837784508,'poi':1.0,'deferred_income':1.0,'shared_receipt_with_poi':0.1583046549538127,'restricted_stock':0.17265209213492153,'long_term_incentive':0.013803111652000      

{'salary':0.24744478779905296,'deferral_payments':0.01569801010492397,'total_payments':0.01228550157492107,'loan_advances':0.0,'bonus':0.075,'restricted_stock_deferred':0.1036542684938879,'total_stock_value':: 379,664 0.550692201098954,'exercised_stock_options':0.011200759837784508,'poi':1.0,'deferred_income':1.0,'shared_receipt_with_poi':0.1583046549538127,'restricted_stock':0.17265209213492153,'long_term_incentive':0.013803111652000

如何从data_dict(旧字典)中获取键以重新缩放其数据,并将其放到新字典中?

1 个答案:

答案 0 :(得分:0)

就像乔·帕滕(Joe Patten)所说的那样,熊猫使事情变得更容易,您可以将字典转换为数据框,进行处理,然后根据需要将其转换回字典。

import pandas as pd
from sklearn.preprocessing import MinMaxScaler

ser = pd.Series(METTS_MARK) #I am using your METTS_MARK

ser.replace('NaN',0,inplace=True)
ser.drop(index="email_address",inplace=True) #to make everything numerical so we can scale, you can add it back later

df = pd.DataFrame(ser)

scaler = MinMaxScaler()
df[0] = scaler.fit_transform(df)

完成后:

newDict = df[0].to_dict()