如何在Python中使用带有字典中路径名的for循环导入文件?

时间:2019-01-18 16:44:07

标签: python-3.x pandas dictionary for-loop

我想创建一个字典,其中包含导入文件,解析日期等所需的所有信息。然后,我想使用一个for循环来导入所有这些文件。但是在for循环完成之后,我只剩下了字典中的最后一个数据集。好像它覆盖了它们。

我执行path文件夹中的文件,所以这不是问题。

我尝试创建一个新字典,在其中添加每个导入,但是这在以后需要引用它们时变得更加困难。我希望它们在变量资源管理器中作为单独的数据框。

代码如下:

import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.ticker import AutoMinorLocator # for time series visualisation
# Import data
#PATH = r"C:\Users\sherv\OneDrive\Documents\GitHub\Python-Projects\Research Project\Data"    
data = {"google":["multiTimeline.csv", "Month"], 
    "RDPI":  ["RealDisposableIncome-2004-1_Present-Mon-US(Grab-30-11-18).csv", "DATE"], 
    "CPI":   ["CPI.csv", "DATE"],
    "GDP":   ["GDP.csv", "DATE"], 
    "UE":    ["Unemployment_2004_Present_US(Grab-5-12-18).csv", "DATE"], 
    "SP500": ["S&P500.csv", "Date"], 
    "IR":    ["InterestRate_2004-1-1_Present_US(Grab-5-12-18).csv", "DATE"], 
    "PPI":   ["PPIACO.csv", "DATE"],
    "PMI":   ["ISM-MAN_PMI.csv", "Date"]}

for dataset in data.keys():
    dataset = pd.read_csv("%s" %(data[dataset][0]), index_col="%s" %(data[dataset][1]), parse_dates=["%s" %(data[dataset][1])])
    dataset = dataset.loc["2004-01-01":"2018-09-01"]
# Visualise
minor_locator = AutoMinorLocator(12)
# Investigating overall trendSS
def google_v_X(Data_col, yName, title):
    fig, ax1 = plt.subplots()
    google["Top5"].plot(ax=ax1,color='b').xaxis.set_minor_locator(minor_locator)
    ax1.set_xlabel('Date')
    ax1.set_ylabel('google (%)', color='b')
    ax1.tick_params('y', colors='b')
    plt.grid()
    ax2 = ax1.twinx()
    Data_col.plot(ax=ax2,color='r')
    ax2.set_ylabel('%s' %(yName), color='r')
    ax2.tick_params('%s' %(yName), colors='r')
    plt.title("Google vs %s trends" %(title))
# Google-CPI
google_v_X(CPI["CPI"], "CPI 1982-1985=100 (%)", "CPI")
# Google-RDPI
google_v_X(RDPI["DSPIC96"], "RDPI ($)", "RDPI")
# Google-GDP
google_v_X(GDP["GDP"], "GDP (B$)", "GDP")    
# Google-UE
google_v_X(UE["Value"], "Unemployed persons", "Unemployment")
# Google-SP500
google_v_X(SP500["Close"], "SP500", "SP500")
# Google-PPI
google_v_X(PPI["PPI"], "PPI")
# Google-PMI
google_v_X(PMI["PMI"], "PMI", "PMI")
# Google-IR
google_v_X(IR["FEDFUNDS"], "Fed Funds Rate (%)", "Interest Rate")

我还尝试创建一个函数来读取和解析,然后在类似以下的循环中使用它:

def importdata(key, path ,parseCol):
    key = pd.read_csv("%s" %(path), index_col="%s" %(parseCol), parse_dates=["%s" %(parseCol)])
    key = key.loc["2004-01-01":"2018-09-01"]
for dataset in data.keys():
    importdata(dataset, data[dataset][0], data[dataset][0])

但是我收到一个错误,因为它无法将路径识别为字符串,并且说它未定义。

如何让它们彼此不覆盖,或者如何让python将函数的输入识别为字符串?感谢您的任何帮助,谢谢

2 个答案:

答案 0 :(得分:1)

for循环引用相同的数据集变量,因此每次执行循环时,该变量将被新导入的数据集替换。您需要将结果存储在某个地方,无论是每次将其存储为新变量还是将其存储在字典中。尝试这样的事情:

googleObj = None
RDPIObj = None
CPIObj = None

data = {"google":[googleObj, "multiTimeline.csv", "Month"], 
    "RDPI":  [RDPIObj,"RealDisposableIncome-2004-1_Present-Mon-US(Grab-30-11-18).csv", "DATE"], 
    "CPI":   [CPIObj, "CPI.csv", "DATE"]}

for dataset in data.keys():
    obj = data[dataset][0]
    obj = pd.read_csv("%s" %(data[dataset][1]), index_col="%s" %(data[dataset][2]), parse_dates=["%s" %(data[dataset][2])])
    obj = dataset.loc["2004-01-01":"2018-09-01"]


这样,您将为每个数据集都有一个本地数据框对象。缺点是您必须定义每个变量。

另一个选择是制作第二个像您提到的字典,像这样:

data = {"google":["multiTimeline.csv", "Month"], 
    "RDPI":  ["RealDisposableIncome-2004-1_Present-Mon-US(Grab-30-11-18).csv", "DATE"], 
    "CPI":   ["CPI.csv", "DATE"]}

output_data = {}
for dataset_key in data.keys():
    dataset = pd.read_csv("%s" %(data[dataset_key][0]), index_col="%s" %(data[dataset_key][1]), parse_dates=["%s" %(data[dataset_key][1])])
    dataset = dataset.loc["2004-01-01":"2018-09-01"]
    output_data[dataset_key] = dataset

答案 1 :(得分:0)

可复制的示例(但是,使用“ exec”时应格外小心):

# Generating data
import os
import pandas as pd
os.chdir(r'C:\Windows\Temp')
df1 = pd.DataFrame([['a',1],['b',2]], index=[0,1], columns=['col1','col2'])
df2 = pd.DataFrame([['c',3],['d',4]], index=[2,3], columns=['col1','col2'])

# Exporting data
df1.to_csv('df1.csv', index_label='Month')
df2.to_csv('df2.csv', index_label='DATE')

# Definition of Loading metadata
loading_metadata = {
    'df1_loaded':['df1.csv','Month'],
    'df2_loaded':['df2.csv','DATE'],
}

# Importing with accordance to loading_metadata (caution for indentation)
for dataset in loading_metadata.keys():
    print(dataset, loading_metadata[dataset][0], loading_metadata[dataset][1])
    exec(
"""
{0} = pd.read_csv('{1}', index_col='{2}').rename_axis('')
""".format(dataset, loading_metadata[dataset][0], loading_metadata[dataset][1])
)

导出的数据(df1.csv):

Month,col1,col2
0,a,1
1,b,2

导出的数据(df2.csv):

DATE,col1,col2
2,c,3
3,d,4

加载的数据:

df1_loaded
    col1    col2
0   a   1
1   b   2

df2_loaded
    col1    col2
2   c   3
3   d   4