Question

我有一个带有两个经济变量（房屋开始和失业）的时间序列的csv文件。我有一个计算列表和一个摘要（文本），该摘要与计算的输出一起编写（基本上以段落格式总结了数据的趋势）。我想获得有关如何获得for循环以遍历csv文件中每个变量的反馈，因此我将每个变量的摘要作为最终输出。

我尝试应用for循环的基本逻辑，但是我不确定自己的错误之处。我看了一些关于stackoverflow的示例，但似乎都不合适，我敢肯定我缺少一些简单的东西，但是没有使用python很久了，所以现在不确定。

raw_data = pd.read_csv('C:/Users/J042666/Desktop/2019.03 HOUST and GDP.csv')
df = pd.DataFrame(raw_data)

for i in df:

    freq = "monthly "
    units = " million "
    pos = 1
    colname = df.columns[pos]

    alltime = df.mean()
    low = df.min()
    maximum = df.max()
    today = df.iloc[720]
    one_year = df.iloc[709:721].mean()
    two_year = df.iloc[697:721].mean()
    five_year = df.iloc[661:721].mean()
    one_year_vol = df.iloc[709:721].std()
    two_year_vol = df.iloc[697:721].std()
    five_year_vol = df.iloc[661:721].std()
    today_vs_1 = ((today/one_year) -1)*100
    today_vs_2 = ((today/two_year) -1)*100 
    today_vs_5 = ((today/five_year) -1)*100
    rolling_1 = df.rolling(window=3).mean()
    rolling_2 = df.rolling(window=6).mean()
    rolling_3 = df.rolling(window=9).mean()
    today_vs_1_rolling = ((today/rolling_1.iloc[720]) -1)*100
    today_vs_2_rolling = ((today/rolling_2.iloc[720]) -1)*100 
    today_vs_3_rolling = ((today/rolling_3.iloc[720]) -1)*100
    summary = ("The " + str(freq) + str(colname) + " currently stands at " + str(today) + str(units) + " which compares to the 1,2 and 5 year averages of " + str(one_year) + str(units) + "," + str(two_year) + str(units) + "," + " and " + str(five_year) + str(units) + " respectively. " + " Based on the current " + str(colname) + " levels, that reflects a change of" + str(today_vs_1) + ", " + str(today_vs_2) + " and " + str(today_vs_5) + " respectively." " Since the metric began being tracked, the minimum, maximum and long run average total " + str(low) + str(units) + ", " + str(maximum) + str(units) + " and " + str(alltime) + str(units) + " respectively. " "The 1, 2 and 5 year standard deviation for " + str(colname) + " totals " + str(one_year_vol) + str(units) + " ," + str(two_year_vol) + str(units) + " and" + str(five_year_vol) + str(units) + " respectively." + " Based on the current " + str(colname) + " levels compared to the 3, 6 and 9 month rolling averages, the current level reflects a change of " + str(today_vs_1_rolling) + ", " + str(today_vs_2_rolling) + " and " + str(today_vs_3_rolling) + " respectively.")
print(summary)

如上所述，我希望有一些代码可以生成我在for循环中为每个变量计算的财务指标的段落摘要。

Answer 1

问题是您要选择整个数据框，而不是单独选择每个列；因此，您对两个列都进行了分析。我还只是从您的操作中提取了所需的值，而不是保留从Pandas打印出来的整个文本。

这应该有效：

df = pd.read_csv('2019.03 HOUST and GDP.csv')
df = df.loc[:, ['Housing Starts', 'Unemployment Rate']]

for idx, col in enumerate(df.columns):

    freq = "monthly "
    units = " million "
    colname = col

    selectedCol = df.loc[:, [col]]

    alltime = selectedCol.mean()[0]
    low = selectedCol.min()[0]
    maximum = selectedCol.max()[0]
    today = selectedCol.iloc[720][0]
    one_year = selectedCol.iloc[709:721].mean()[0]
    two_year = selectedCol.iloc[697:721].mean()[0]
    five_year = selectedCol.iloc[661:721].mean()[0]
    one_year_vol = selectedCol.iloc[709:721].std()[0]
    two_year_vol = selectedCol.iloc[697:721].std()[0]
    five_year_vol = selectedCol.iloc[661:721].std()[0]
    today_vs_1 = ((today/one_year) -1)*100
    today_vs_2 = ((today/two_year) -1)*100
    today_vs_5 = ((today/five_year) -1)*100
    rolling_1 = selectedCol.rolling(window=3).mean()
    rolling_2 = selectedCol.rolling(window=6).mean()
    rolling_3 = selectedCol.rolling(window=9).mean()
    today_vs_1_rolling = ((today/rolling_1.iloc[720]) -1)*100
    today_vs_2_rolling = ((today/rolling_2.iloc[720]) -1)*100
    today_vs_3_rolling = ((today/rolling_3.iloc[720]) -1)*100
    summary = ("The " + str(freq) + str(colname) + " currently stands at " + str(today) + str(units) + " which compares to the 1,2 and 5 year averages of " + str(one_year) + str(units) + "," + str(two_year) + str(units) + "," + " and " + str(five_year) + str(units) + " respectively. " + " Based on the current " + str(colname) + " levels, that reflects a change of" + str(today_vs_1) + ", " + str(today_vs_2) + " and " + str(today_vs_5) + " respectively." " Since the metric began being tracked, the minimum, maximum and long run average total " + str(low) + str(units) + ", " + str(maximum) + str(units) + " and " + str(alltime) + str(units) + " respectively. " "The 1, 2 and 5 year standard deviation for " + str(colname) + " totals " + str(one_year_vol) + str(units) + " ," + str(two_year_vol) + str(units) + " and" + str(five_year_vol) + str(units) + " respectively." + " Based on the current " + str(colname) + " levels compared to the 3, 6 and 9 month rolling averages, the current level reflects a change of " + str(today_vs_1_rolling[0]) + ", " + str(today_vs_2_rolling[0]) + " and " + str(today_vs_3_rolling[0]) + " respectively.")
    print(summary)

需要帮助完善代码以运行for循环以汇总csv文件中的经济变量吗？

1 个答案: