我一直在研究Jupyter笔记本,以导入Google电子表格,并使用熊猫对原始电子表格进行一些操作。
我现在想将我的笔记本“转换”为一个.py文件,以后我可以在终端中执行该文件。我要做的第一件事是,我将笔记本中定义的功能用于单独的utils.py文件。
其中一个函数使用字典的键和值(我也将其移到utils.py文件中),但是当我在笔记本中调用该函数时,出现名称错误,告诉我字典未定义
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-39-26b9f9a7246e> in <module>()
----> 1 cols_ls = columns_to_keep("Score")
~/my_CS109a/content/utils.py in columns_to_keep(column_str)
5 sections_dict["1"] = [1, 2, 3, 4, 5]
6 sections_dict["2"] = [1, 2, 3, 4, 5, 6, 7, 8, 9]
----> 7 sections_dict["3"] = [1, 2, 3, 4, 5, 6, 7, 8, 9]
8 sections_dict["4"] = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
9 sections_dict["5"] = [1, 2, 3, 4, 5, 6]
NameError: name 'sections_dict' is not defined
这是我导入utils文件以及在Notebook中调用该函数的方式。
from utils import *
# Say I want to pass the string Score to the function
cols_ls = columns_to_keep("Score")
我的utils.py
粘贴在下面:
# Create empty dictionary to map every section title to the number of questions they contain
sections_dict = dict()
# Add all section names as key and a list of the questions they contain as list
sections_dict["1"] = [1, 2, 3, 4, 5]
sections_dict["2"] = [1, 2, 3, 4, 5, 6, 7, 8, 9]
sections_dict["3"] = [1, 2, 3, 4, 5, 6, 7, 8, 9]
sections_dict["4"] = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
sections_dict["5"] = [1, 2, 3, 4, 5, 6]
sections_dict["6"] = [1, 2, 3, 4, 5, 6]
sections_dict["7"] = [1, 2, 3, 4]
sections_dict["8"] = [1, 2, 3, 4, 5]
sections_dict["9"] = [1, 2, 3, 4, 5, 6, 7, 8]
sections_dict["10"] = [1, 2, 3, 4]
# Function to create the list of columns to keep from the copy DataFrame. Takes a string as an argument and
# iteratively creates the names of the columns to keep extracting the number of the section and the the questions
# from the sections_dict object. Returns list of strings of the format ColumnName_SectionNumber_QuestionNumber.
def columns_to_keep (column_str = "name"):
columns_to_keep_ls = []
for key_i, value_ls in sections_dict.items():
for question_i in range(len(value_ls)):
columns_to_keep_ls.append("{}_{}_{}".format(column_str, key_i, value_ls[question_i]))
return (columns_to_keep_ls)
# Create a function that takes the name of a company and the section over which to calculate the score, it should output a float as the score of the section.
def calculate_section_scores(company_name="name", section=0):
ls = (
scores_df.loc["{}".format(company_name)]
# Hard code the '1' as I will always want to start from the 1st question.
# Get length of list in questions_list corresponding to last question in that section.
["Score_{}_1".format(section):"Score_{}_{}".format(section, len(questions_list[section-1]))]
).values[:]
score_flt = (pd.to_numeric(ls).sum())/max_scores_ls[section-1]
return (score_flt)