我想创建一个字典,其中包含导入文件,解析日期等所需的所有信息。然后,我想使用一个for循环来导入所有这些文件。但是在for循环完成之后,我只剩下了字典中的最后一个数据集。好像它覆盖了它们。
我执行path文件夹中的文件,所以这不是问题。
我尝试创建一个新字典,在其中添加每个导入,但是这在以后需要引用它们时变得更加困难。我希望它们在变量资源管理器中作为单独的数据框。
代码如下:
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.ticker import AutoMinorLocator # for time series visualisation
# Import data
#PATH = r"C:\Users\sherv\OneDrive\Documents\GitHub\Python-Projects\Research Project\Data"
data = {"google":["multiTimeline.csv", "Month"],
"RDPI": ["RealDisposableIncome-2004-1_Present-Mon-US(Grab-30-11-18).csv", "DATE"],
"CPI": ["CPI.csv", "DATE"],
"GDP": ["GDP.csv", "DATE"],
"UE": ["Unemployment_2004_Present_US(Grab-5-12-18).csv", "DATE"],
"SP500": ["S&P500.csv", "Date"],
"IR": ["InterestRate_2004-1-1_Present_US(Grab-5-12-18).csv", "DATE"],
"PPI": ["PPIACO.csv", "DATE"],
"PMI": ["ISM-MAN_PMI.csv", "Date"]}
for dataset in data.keys():
dataset = pd.read_csv("%s" %(data[dataset][0]), index_col="%s" %(data[dataset][1]), parse_dates=["%s" %(data[dataset][1])])
dataset = dataset.loc["2004-01-01":"2018-09-01"]
# Visualise
minor_locator = AutoMinorLocator(12)
# Investigating overall trendSS
def google_v_X(Data_col, yName, title):
fig, ax1 = plt.subplots()
google["Top5"].plot(ax=ax1,color='b').xaxis.set_minor_locator(minor_locator)
ax1.set_xlabel('Date')
ax1.set_ylabel('google (%)', color='b')
ax1.tick_params('y', colors='b')
plt.grid()
ax2 = ax1.twinx()
Data_col.plot(ax=ax2,color='r')
ax2.set_ylabel('%s' %(yName), color='r')
ax2.tick_params('%s' %(yName), colors='r')
plt.title("Google vs %s trends" %(title))
# Google-CPI
google_v_X(CPI["CPI"], "CPI 1982-1985=100 (%)", "CPI")
# Google-RDPI
google_v_X(RDPI["DSPIC96"], "RDPI ($)", "RDPI")
# Google-GDP
google_v_X(GDP["GDP"], "GDP (B$)", "GDP")
# Google-UE
google_v_X(UE["Value"], "Unemployed persons", "Unemployment")
# Google-SP500
google_v_X(SP500["Close"], "SP500", "SP500")
# Google-PPI
google_v_X(PPI["PPI"], "PPI")
# Google-PMI
google_v_X(PMI["PMI"], "PMI", "PMI")
# Google-IR
google_v_X(IR["FEDFUNDS"], "Fed Funds Rate (%)", "Interest Rate")
我还尝试创建一个函数来读取和解析,然后在类似以下的循环中使用它:
def importdata(key, path ,parseCol):
key = pd.read_csv("%s" %(path), index_col="%s" %(parseCol), parse_dates=["%s" %(parseCol)])
key = key.loc["2004-01-01":"2018-09-01"]
for dataset in data.keys():
importdata(dataset, data[dataset][0], data[dataset][0])
但是我收到一个错误,因为它无法将路径识别为字符串,并且说它未定义。
如何让它们彼此不覆盖,或者如何让python将函数的输入识别为字符串?感谢您的任何帮助,谢谢
答案 0 :(得分:1)
for循环引用相同的数据集变量,因此每次执行循环时,该变量将被新导入的数据集替换。您需要将结果存储在某个地方,无论是每次将其存储为新变量还是将其存储在字典中。尝试这样的事情:
googleObj = None
RDPIObj = None
CPIObj = None
data = {"google":[googleObj, "multiTimeline.csv", "Month"],
"RDPI": [RDPIObj,"RealDisposableIncome-2004-1_Present-Mon-US(Grab-30-11-18).csv", "DATE"],
"CPI": [CPIObj, "CPI.csv", "DATE"]}
for dataset in data.keys():
obj = data[dataset][0]
obj = pd.read_csv("%s" %(data[dataset][1]), index_col="%s" %(data[dataset][2]), parse_dates=["%s" %(data[dataset][2])])
obj = dataset.loc["2004-01-01":"2018-09-01"]
这样,您将为每个数据集都有一个本地数据框对象。缺点是您必须定义每个变量。
另一个选择是制作第二个像您提到的字典,像这样:
data = {"google":["multiTimeline.csv", "Month"],
"RDPI": ["RealDisposableIncome-2004-1_Present-Mon-US(Grab-30-11-18).csv", "DATE"],
"CPI": ["CPI.csv", "DATE"]}
output_data = {}
for dataset_key in data.keys():
dataset = pd.read_csv("%s" %(data[dataset_key][0]), index_col="%s" %(data[dataset_key][1]), parse_dates=["%s" %(data[dataset_key][1])])
dataset = dataset.loc["2004-01-01":"2018-09-01"]
output_data[dataset_key] = dataset
答案 1 :(得分:0)
可复制的示例(但是,使用“ exec”时应格外小心):
# Generating data
import os
import pandas as pd
os.chdir(r'C:\Windows\Temp')
df1 = pd.DataFrame([['a',1],['b',2]], index=[0,1], columns=['col1','col2'])
df2 = pd.DataFrame([['c',3],['d',4]], index=[2,3], columns=['col1','col2'])
# Exporting data
df1.to_csv('df1.csv', index_label='Month')
df2.to_csv('df2.csv', index_label='DATE')
# Definition of Loading metadata
loading_metadata = {
'df1_loaded':['df1.csv','Month'],
'df2_loaded':['df2.csv','DATE'],
}
# Importing with accordance to loading_metadata (caution for indentation)
for dataset in loading_metadata.keys():
print(dataset, loading_metadata[dataset][0], loading_metadata[dataset][1])
exec(
"""
{0} = pd.read_csv('{1}', index_col='{2}').rename_axis('')
""".format(dataset, loading_metadata[dataset][0], loading_metadata[dataset][1])
)
导出的数据(df1.csv):
Month,col1,col2
0,a,1
1,b,2
导出的数据(df2.csv):
DATE,col1,col2
2,c,3
3,d,4
加载的数据:
df1_loaded
col1 col2
0 a 1
1 b 2
df2_loaded
col1 col2
2 c 3
3 d 4