将CSV文件编号转换为书面段落?

时间:2019-01-11 21:20:26

标签: python csv

我想将季度和年初至今的财务信息转换为绩效的段落摘要。我遇到的问题是我想为每个指标分配单位,例如,我希望保证金的单位为“%”,销售单位为“百万”。我列出了需要向其分配单位的不同指标。我在想以下方法来解决此问题,但此方法起作用。我已经附上一张csv文件的图片和我需要修复的输出CSV fileouput

for line[1] in rows_one:
    margin = ['Quarterly Gross Margin', 'YTD Gross Profit Margin', 'Quarterly Adj. EBITDA Margin', 'YTD Adj. EBITDA Margin']
    dollars = ['Quarterly Revenue', 'YTD Revenue', 'Quarterly Gross Profit', 'YTD Gross Profit', 'Quarterly Adj. EBITDA', 'YTD Adj. EBITDA', 'Quarterly Free Cash Flow', 'YTD Free Cash Flow', 'Cash Balance', 'Liquidity (Cash + Undrawn)']
    leverage = ['Senior Leverage', 'Total Leverage']

for line[1] in rows_one
    margin_format = " %"
    dollars_format = " million"
    leverage_format = " x"

我还需要一些帮助,以了解如何将以下代码集成到下面的代码中以及如何更改底部的段落以包括所做的更改。

有人可以帮助我有效地分配单位,以及如何/在何处将其整合到段落中吗?我是Python的新手,正在努力学习。对于新手的问题表示歉意。

def integer(entry): #integer turns "5" into 5
    return float(entry)

def percent(entry): #percent turns "25.00%" into 25.00
    return float(entry[0:-1] )
rows_one = ['Quarterly Revenue', "YTD Revenue", 'Quarterly Gross Profit', 'Quarterly Gross Margin', 'YTD Gross Profit', 'YTD Gross Profit Margin', 'Quarterly Adj. EBITDA', 'Quarterly Adj. EBITDA Margin', 'YTD Adj. EBITDA', 'YTD Adj. EBITDA Margin', 'Quarterly Free Cash Flow', 'YTD Free Cash Flow', 'Cash Balance', 'Senior Leverage', 'Total Leverage', 'Liquidity (Cash + Undrawn)']  

convs = [integer, integer, integer, percent, integer, percent, integer, percent, integer, percent, integer, integer, integer, integer, integer, integer]
unit_percent = "%"
units = " million"

dates = [] #dates will be before/after as found on line 1 of the csv, columns 2 and 3
file = open ('C:/Users/J042666/Desktop/test_3333.csv') #open the file

paragraph = ''
for line in file: #reading through each line
    line = line.strip().split(',') #split the csv by comma
    if line[0] != '': # column placement so A in the csv file
        print("---- "+line[0]+" ----") # Company seperation header for each business in column A

    if dates == []: #first line should contain dates, if there are no dates, we are on the first line
        dates.append(line[4]) #storing index 4 and 8 as first and second entry to dates list
        dates.append(line[8])

if line[1] in rows_one: # if line's index 1 (B column) is found in the list of our rows of interest
    row_index = rows_one.index(line[1]) #numeric position/index of row keyword
    conv_func = convs[row_index]
    before = conv_func(line[4]) #columns that correspond to the prior year date
    after = conv_func(line[8]) #columns that correspond to the current year date
    diff = (after - before) #math!
    diff_format = round(diff, 2)
    #outputing findings:
    label = "increasing" if diff >= 0 else "decreasing"
    growth = str(((after/before) - 1)*100)
    growth_format = ("{0:.3}".format(growth,3))
    paragraph += line[1] + " during the period went from " + "$" + str(before) + str(units) + " on " + dates[0] + " to " + "$" + str(after) + str(units) + " on " + dates[1] + " " + label + " by " + "$" + str(diff_format) + str(units) + " or " + label + " " + growth_format + str(unit_percent) + ". "
    if line[1] in ['YTD Revenue', 'YTD Gross Profit Margin', 'YTD Adj. EBITDA Margin', 'YTD Free Cash Flow', 'Cash Balance', 'Total Leverage', 'Liquidity (Cash + Undrawn)']:
        print(paragraph)
        paragraph = ''

1 个答案:

答案 0 :(得分:0)

您可以为格式定义格式字符串,并使用字典将转换和格式字符串映射到行类别。

例如:

  margin_format = '{}%'
  dollars_format = '${} million'
  leverage_format = '{} x'

  conv_mappings = {
      "Quarterly Revenue": (dollars_format, integer),
      "YTD Revenue": (dollars_format, integer),
      "Quarterly Gross Profit": (dollars_format, integer),
      "Quarterly Gross Margin": (margin_format, percent),
      ...
      "Senior Leverage": (leverage_format, integer),
      "Total Leverage": (leverage_format, integer),
      "Liquidity (Cash + Undrawn)": (dollars_format, integer)
  }

从字典中获取单位格式和转换函数,并使用sequence unpacking将其分别分配给变量:

fmt_string, conv_func = conv_mappings[line[1]]
# Convert values
before = conv_func(line[4])
after = conv_func(line[8])

如果您使用的是Python 3.6或更高版本,则可以使用f-string格式来构建段落:

# Format your values
before = fmt_string.format(before) 
after = fmt_string.format(after) 

# Use formatted values in your paragraph format.
para = f'{before} and {after}'

如果您使用的是较旧的Python,则可以使用字典:

# Define the paragraph format.
para_fmt = '{before} and {after}.'
# Format your values
mapping = {
    'before': fmt_string.format(before),
    'after': fmt_string.format(after)
} 

para = para_fmt.format(**mapping)

使用字符串格式化比使用+=逐步构建长字符串更有效,并且通常更具可读性。

最后,Python附带了一个csv模块,该模块可以解析csv文件,因此您无需自己在逗号上分割行,并且可以处理一些极端情况,例如当单元格包含嵌入式逗号时。您可以像这样使用它:

import csv

with open('myfile.csv') as f:
    reader = csv.reader(f) 
    next(reader) # Do this *if* you want to skip the header row 
    for row in reader:
        # row is a list of strings ['Things', '1.2', '5']