我想在python中用户编写的函数中操作数据帧。当我在函数外部运行它时,操作代码工作正常。但是,当我把它放在函数中并运行它运行的函数时没有错误但不返回任何数据帧。我的代码如下所示:
def reshape(file):
from IPython import get_ipython
get_ipython().magic('reset -sf')
#import packages
import pandas as pd
import datetime
import calendar
#define file path and import files
path="X:/TEMP/"
file_path =path+file
df = pd.read_excel(file_path, "Sheet1", parse_dates=["Date"])
#reshape data to panel
melted = pd.melt(df,id_vars="Date", var_name="id", value_name="Market_Cap")
melted["id"] = melted["id"].str.replace("id", "")
melted.id = melted.id.astype(int)
melted.reset_index(inplace=True, drop=True)
id_to_string = pd.read_excel(file_path, "Sheet2")
id_to_string = id_to_string.transpose()
id_to_string.reset_index(level=0, inplace=True)
id_to_string.rename(columns = {0: 'id'}, inplace=True)
id_to_string.rename(columns = {"index": 'Ticker'}, inplace=True)
merged = pd.merge(melted, id_to_string, how="left", on="id")
merged = merged.sort(["Date","Market_Cap"], ascending=[1,0])
merged["Rank"] = merged.groupby(["Date"])["Market_Cap"].rank(ascending=True)
df = pd.read_excel(file_path, "hardcopy_return", parse_dates=["Date"])
df = df.sort("Date", ascending=1)
old = merged
merged = pd.merge(old,df, on=["Date", "id"])
merged = merged.set_index("Date")
return merged
reshape("sample.xlsx")
此代码运行但不返回任何内容。我在def命令或调用函数时犯了错误吗?非常感谢任何帮助。
答案 0 :(得分:1)
我认为这是用iPython或jupyter笔记本运行的?
它之前可能有用,因为内核会记住某些状态。在将某些东西变成单独的函数而不是直接的脚本之前,我做了restart kernel & run All
在代码本身上,我会分割代码的不同部分,因此测试单个部分变得更容易
import pandas as pd
import datetime
import calendar
from IPython import get_ipython
get_ipython().magic('reset -sf')
从第一张工作表中获取数据并进行第一次处理
def read_melted(file_path):
df1 = pd.read_excel(file_path, sheetname='Sheet1', parse_date["Date"])
melted = pd.melt(df,id_vars="Date", var_name="id", value_name="Market_Cap")
melted.id = melted.id.astype(int)
melted.reset_index(inplace=True, drop=True)
return melted
def read_id_to_spring(file_path):
df2 = pd.read_excel(file_path, sheetname='Sheet2')
id_to_string = id2.transpose()
id_to_string.reset_index(level=0, inplace=True)
id_to_string.rename(columns = {0: 'id'}, inplace=True)
id_to_string.rename(columns = {"index": 'Ticker'}, inplace=True)
return id_to_string
def read_hardcopy_return(file_path):
df = pd.read_excel(file_path, sheetname='hardcopy_return', parse_date["Date"])
return df.sort("Date", ascending=1)
def reshape(df1, df2, df_hardcopy_return):
merged = pd.merge(df1, df2, how="left", on="id").sort(["Date","Market_Cap"], ascending=[1,0])
merged["Rank"] = merged.groupby(["Date"])["Market_Cap"].rank(ascending=True) # what does this line do?
merged_all = pd.merge(merged,df_hardcopy_return, on=["Date", "id"]).set_index("Date")
return merged_all
path="X:/TEMP/"
file_path =path+file
df1 = read_melted(file_path)
df2 = read_id_to_spring(file_path)
df_hardcopy_return = read_hardcopy_return(file_path)
reshape(df1, df2, df_hardcopy_return)
唯一让我感到奇怪的是行merged["Rank"] = merged.groupby(["Date"])["Market_Cap"].rank(ascending=True)
sheetname
pandas.read_excel
也有一个sheetname
参数,您可以使用该参数只打开一次。有时读取excel文件可能会很慢,所以这也可能使它更快