基于使用python的表的复杂csv数据映射?

时间:2018-04-25 14:24:39

标签: python csv

我有csv数据,如:

id;12;12;13;13
company;Fox century;Fox century;Apple company;Apple company
ticker;fox;fox;appl;appl
industry code;2;2;3;3
indicator;  Share Price; Common Shares Outstanding;Share Price;Common Shares Outstanding
2011-11-04; 2.72;   65046.232; 2.33; 3443
2012-02-06; 2.89;   65065.558; 2.44; 4242     
2012-05-04; 3.04;   64788.687; 2.44; 2222
.........................................

每行包含id,company,ticker; industrycode; indicator; 2011-11-04(date); 2012-02-06(date); .. etc。

我想根据我的db模型插入这些数据: 我有2个表公司,指标

公司表有4列:Id(primary_key),公司名称,股票代码,行业代码 指标表有4栏:股价;普通股优秀,日期,身份证(外键) 我尝试使用公司表,并使用以下代码逻辑成功地将数据插入其中:

with open('testss.csv', newline='') as f_input:
csv_input = csv.reader(f_input, skipinitialspace=True)
block = []
for row in csv_input:
    if len(row):
        if row[0] == 'id':
            if block:
                print(block)
            block = [row]
        else:
            block.append(row)
for i in range(1,len(block[0])):
    print (block[0][0]+" : "+block[0][i])
    print (block[1][0]+" : "+block[1][i])
    print (block[2][0]+" : "+block[2][i])
    print (block[3][0]+" : "+block[3][i])

我不明白如何在指标表中插入数据,如下所示:

id|date      |SharePrice | Common Shares Outstanding
12|2011-11-04| 2.72      | 65046.232
13|2011-11-04|2.33       | 3443                

请给出一些逻辑(代码示例)来实现这一点。

1 个答案:

答案 0 :(得分:0)

一种可能的方法是分两部分读取文件:首先是公司部分(你做了什么),然后是指标一,在指标行之后开始。在阅读公司部分时,我的建议是设置一个映射,其中键是第一列,值是整行:它将用于处理指标部分,因为id将是的元素id 行和相同的索引,列名称将是指标行的元素,并且仍然是相同的索引。

代码可能变成:

c_part = {}
with open('testss.csv', newline='') as f_input:
    csv_input = csv.reader(f_input, skipinitialspace=True)
    for row in csv_input:                 # first process the company part
        c_part[row[0]] = row
        if row[0] == 'indicator': break   # up to the indicator line
    # ok, we have the companies
    companies = { ( c_part['id'][i], c_part['company'][i],
                    c_part['ticker'][i], c_part['industry code'][i] )
                  for i in range(1, len(c_part['id'])) }
    print(companies)
    for row in csv_input:                 # now the indicator part
        date = row[0]
        data = collections.defaultdict(dict)
        for i in range(1, len(row)):
            data[c_part['id'][i]][c_part['indicator'][i]] = row[i]
        for i in data:
            data[i].update({'id': i, 'date': date})  # update with the date and id part
            print(dict(data[i]))          # this is a line of the indicator table