Question

我有csv数据，如：

id;12;12;13;13
company;Fox century;Fox century;Apple company;Apple company
ticker;fox;fox;appl;appl
industry code;2;2;3;3
indicator;  Share Price; Common Shares Outstanding;Share Price;Common Shares Outstanding
2011-11-04; 2.72;   65046.232; 2.33; 3443
2012-02-06; 2.89;   65065.558; 2.44; 4242     
2012-05-04; 3.04;   64788.687; 2.44; 2222
.........................................

每行包含id，company，ticker; industrycode; indicator; 2011-11-04（date）; 2012-02-06（date）; .. etc。

我想根据我的db模型插入这些数据：我有2个表公司，指标

公司表有4列：Id（primary_key），公司名称，股票代码，行业代码和指标表有4栏：股价;普通股优秀，日期，身份证（外键）我尝试使用公司表，并使用以下代码逻辑成功地将数据插入其中：

with open('testss.csv', newline='') as f_input:
csv_input = csv.reader(f_input, skipinitialspace=True)
block = []
for row in csv_input:
    if len(row):
        if row[0] == 'id':
            if block:
                print(block)
            block = [row]
        else:
            block.append(row)
for i in range(1,len(block[0])):
    print (block[0][0]+" : "+block[0][i])
    print (block[1][0]+" : "+block[1][i])
    print (block[2][0]+" : "+block[2][i])
    print (block[3][0]+" : "+block[3][i])

我不明白如何在指标表中插入数据，如下所示：

id|date      |SharePrice | Common Shares Outstanding
12|2011-11-04| 2.72      | 65046.232
13|2011-11-04|2.33       | 3443

请给出一些逻辑（代码示例）来实现这一点。

Answer 1

一种可能的方法是分两部分读取文件：首先是公司部分（你做了什么），然后是指标一，在指标行之后开始。在阅读公司部分时，我的建议是设置一个映射，其中键是第一列，值是整行：它将用于处理指标部分，因为id将是的元素id 行和相同的索引，列名称将是指标行的元素，并且仍然是相同的索引。

代码可能变成：

c_part = {}
with open('testss.csv', newline='') as f_input:
    csv_input = csv.reader(f_input, skipinitialspace=True)
    for row in csv_input:                 # first process the company part
        c_part[row[0]] = row
        if row[0] == 'indicator': break   # up to the indicator line
    # ok, we have the companies
    companies = { ( c_part['id'][i], c_part['company'][i],
                    c_part['ticker'][i], c_part['industry code'][i] )
                  for i in range(1, len(c_part['id'])) }
    print(companies)
    for row in csv_input:                 # now the indicator part
        date = row[0]
        data = collections.defaultdict(dict)
        for i in range(1, len(row)):
            data[c_part['id'][i]][c_part['indicator'][i]] = row[i]
        for i in data:
            data[i].update({'id': i, 'date': date})  # update with the date and id part
            print(dict(data[i]))          # this is a line of the indicator table

基于使用python的表的复杂csv数据映射？

1 个答案: