我有一个带有两个经济变量(房屋开始和失业)的时间序列的csv文件。我有一个计算列表和一个摘要(文本),该摘要与计算的输出一起编写(基本上以段落格式总结了数据的趋势)。我想获得有关如何获得for循环以遍历csv文件中每个变量的反馈,因此我将每个变量的摘要作为最终输出。
我尝试应用for循环的基本逻辑,但是我不确定自己的错误之处。我看了一些关于stackoverflow的示例,但似乎都不合适,我敢肯定我缺少一些简单的东西,但是没有使用python很久了,所以现在不确定。
raw_data = pd.read_csv('C:/Users/J042666/Desktop/2019.03 HOUST and GDP.csv')
df = pd.DataFrame(raw_data)
for i in df:
freq = "monthly "
units = " million "
pos = 1
colname = df.columns[pos]
alltime = df.mean()
low = df.min()
maximum = df.max()
today = df.iloc[720]
one_year = df.iloc[709:721].mean()
two_year = df.iloc[697:721].mean()
five_year = df.iloc[661:721].mean()
one_year_vol = df.iloc[709:721].std()
two_year_vol = df.iloc[697:721].std()
five_year_vol = df.iloc[661:721].std()
today_vs_1 = ((today/one_year) -1)*100
today_vs_2 = ((today/two_year) -1)*100
today_vs_5 = ((today/five_year) -1)*100
rolling_1 = df.rolling(window=3).mean()
rolling_2 = df.rolling(window=6).mean()
rolling_3 = df.rolling(window=9).mean()
today_vs_1_rolling = ((today/rolling_1.iloc[720]) -1)*100
today_vs_2_rolling = ((today/rolling_2.iloc[720]) -1)*100
today_vs_3_rolling = ((today/rolling_3.iloc[720]) -1)*100
summary = ("The " + str(freq) + str(colname) + " currently stands at " + str(today) + str(units) + " which compares to the 1,2 and 5 year averages of " + str(one_year) + str(units) + "," + str(two_year) + str(units) + "," + " and " + str(five_year) + str(units) + " respectively. " + " Based on the current " + str(colname) + " levels, that reflects a change of" + str(today_vs_1) + ", " + str(today_vs_2) + " and " + str(today_vs_5) + " respectively." " Since the metric began being tracked, the minimum, maximum and long run average total " + str(low) + str(units) + ", " + str(maximum) + str(units) + " and " + str(alltime) + str(units) + " respectively. " "The 1, 2 and 5 year standard deviation for " + str(colname) + " totals " + str(one_year_vol) + str(units) + " ," + str(two_year_vol) + str(units) + " and" + str(five_year_vol) + str(units) + " respectively." + " Based on the current " + str(colname) + " levels compared to the 3, 6 and 9 month rolling averages, the current level reflects a change of " + str(today_vs_1_rolling) + ", " + str(today_vs_2_rolling) + " and " + str(today_vs_3_rolling) + " respectively.")
print(summary)
如上所述,我希望有一些代码可以生成我在for循环中为每个变量计算的财务指标的段落摘要。
答案 0 :(得分:0)
问题是您要选择整个数据框,而不是单独选择每个列;因此,您对两个列都进行了分析。我还只是从您的操作中提取了所需的值,而不是保留从Pandas打印出来的整个文本。
这应该有效:
df = pd.read_csv('2019.03 HOUST and GDP.csv')
df = df.loc[:, ['Housing Starts', 'Unemployment Rate']]
for idx, col in enumerate(df.columns):
freq = "monthly "
units = " million "
colname = col
selectedCol = df.loc[:, [col]]
alltime = selectedCol.mean()[0]
low = selectedCol.min()[0]
maximum = selectedCol.max()[0]
today = selectedCol.iloc[720][0]
one_year = selectedCol.iloc[709:721].mean()[0]
two_year = selectedCol.iloc[697:721].mean()[0]
five_year = selectedCol.iloc[661:721].mean()[0]
one_year_vol = selectedCol.iloc[709:721].std()[0]
two_year_vol = selectedCol.iloc[697:721].std()[0]
five_year_vol = selectedCol.iloc[661:721].std()[0]
today_vs_1 = ((today/one_year) -1)*100
today_vs_2 = ((today/two_year) -1)*100
today_vs_5 = ((today/five_year) -1)*100
rolling_1 = selectedCol.rolling(window=3).mean()
rolling_2 = selectedCol.rolling(window=6).mean()
rolling_3 = selectedCol.rolling(window=9).mean()
today_vs_1_rolling = ((today/rolling_1.iloc[720]) -1)*100
today_vs_2_rolling = ((today/rolling_2.iloc[720]) -1)*100
today_vs_3_rolling = ((today/rolling_3.iloc[720]) -1)*100
summary = ("The " + str(freq) + str(colname) + " currently stands at " + str(today) + str(units) + " which compares to the 1,2 and 5 year averages of " + str(one_year) + str(units) + "," + str(two_year) + str(units) + "," + " and " + str(five_year) + str(units) + " respectively. " + " Based on the current " + str(colname) + " levels, that reflects a change of" + str(today_vs_1) + ", " + str(today_vs_2) + " and " + str(today_vs_5) + " respectively." " Since the metric began being tracked, the minimum, maximum and long run average total " + str(low) + str(units) + ", " + str(maximum) + str(units) + " and " + str(alltime) + str(units) + " respectively. " "The 1, 2 and 5 year standard deviation for " + str(colname) + " totals " + str(one_year_vol) + str(units) + " ," + str(two_year_vol) + str(units) + " and" + str(five_year_vol) + str(units) + " respectively." + " Based on the current " + str(colname) + " levels compared to the 3, 6 and 9 month rolling averages, the current level reflects a change of " + str(today_vs_1_rolling[0]) + ", " + str(today_vs_2_rolling[0]) + " and " + str(today_vs_3_rolling[0]) + " respectively.")
print(summary)