Question

我正在处理像这样的

这样的pickle文件加载数据集

""" Load the dictionary containing the dataset """
with open("final_project_dataset.pkl", "r") as data_file:
    data_dict = pickle.load(data_file)

它工作正常并正确加载数据。这是一行的示例：

'GLISAN JR BEN F': {'salary': 274975, 'to_messages': 873, 'deferral_payments': 'NaN', 'total_payments': 1272284, 'exercised_stock_options': 384728, 'bonus': 600000, 'restricted_stock': 393818, 'shared_receipt_with_poi': 874, 'restricted_stock_deferred': 'NaN', 'total_stock_value': 778546, 'expenses': 125978, 'loan_advances': 'NaN', 'from_messages': 16, 'other': 200308, 'from_this_person_to_poi': 6, 'poi': True, 'director_fees': 'NaN', 'deferred_income': 'NaN', 'long_term_incentive': 71023, 'email_address': 'ben.glisan@enron.com', 'from_poi_to_this_person': 52}

现在，如何获得功能的数量？例如(salary, to_messages, .... , from_poi_to_this_person)？

我通过打印整个数据集（print data_dict）得到了这一行，这是其中一个结果。我想知道有多少特征是一般的，即在整个数据集中没有在字典中指定密钥。

由于

Answer 1

试试这个。

no_of_features = len(data_dict[data_dict.keys()[0]])

仅当data_dict中的所有密钥具有相同数量的功能时，此功能才有效。

~~或只是~~

no_of_features = len(data_dict['GLISAN JR BEN F'])

<击>

Answer 2

将sum应用于每个嵌套词典的len：

sum(len(v) for _, v in data_dict.items())

v表示嵌套的字典对象。

当你在它们上调用迭代器（或类似的东西）时，字典会自然地返回它们的键，所以调用len将返回每个嵌套字典中的键数，即。功能数量。

如果功能可能跨嵌套对象重复，则在集合中收集它们并应用len

len(set(f for v in data_dict.values() for f in v.keys()))

Answer 3

""" Load the dictionary containing the dataset """
with open("final_project_dataset.pkl", "r") as data_file:
  data_dict = pickle.load(data_file)
  print len(data_dict)

Answer 4

我想你想找出行字典中使用的所有唯一字段名称集的大小。你可以找到这样的：

<Route path={'/auth'} component={AuthLayout}>
   <Route path={'forgot'} component={Forgot}/>
   <Route path={'login'} component={Login}/>
</Route>

Answer 5

这是答案
https://discussions.udacity.com/t/lesson-5-number-of-features/44253/4

我们在数据库 enron_data 中选择1个人 SKILLING JEFFREY K 。然后我们打印字典中键的长度。

null

字典中的功能数量

5 个答案: