Django:从CSV创建数据库表时出现KeyError

时间:2018-12-25 19:06:03

标签: python django pandas

我需要根据CSV文件创建表格。

我认为我可以使用不同的库来完成此操作,但是在这种情况下,我选择使用pandas,因为在不久的将来我将需要它来进行某些数据分析。

我有一个脚本,但出现此错误:

Traceback (most recent call last):
  File "/home/gonzales/Escritorio/virtual_envs/stickers_gallito_env/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3078, in get_loc
    return self._engine.get_loc(key)
  File "pandas/_libs/index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 958, in pandas._libs.hashtable.Int64HashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 964, in pandas._libs.hashtable.Int64HashTable.get_item
KeyError: 1867

Dropbox中的数据:

https://www.dropbox.com/s/o3iga509qi8suu9/ubigeo-peru-2018-12-25.csv?dl=0

脚本

import pandas as pd
import csv
from shop.models import Peru
from django.core.management.base import BaseCommand


tmp_data=pd.read_csv('static/data/ubigeo-peru-2018-12-25.csv',sep=',', encoding="utf-8")


class Command(BaseCommand):
    def handle(self, **options):
        products = [
            Peru(
                departamento=tmp_data.ix[row]['departamento'],
                provincia=tmp_data.ix[row]['provincia'],
                distrito=tmp_data.ix[row]['distrito'],
            )
            for row in tmp_data['id']
        ]

        Peru.objects.bulk_create(products)

models.py

class Peru(models.Model):
    departamento = models.CharField(max_length=100, blank=False)
    provincia = models.CharField(max_length=100, blank=False)
    distrito = models.CharField(max_length=100, blank=False)

    def __str__(self):
        return self.departamento

2 个答案:

答案 0 :(得分:1)

此方法不起作用(并向最后一个对象引发错误)的原因是row实际上是数据的id,当您将其用作索引。

改为使用它:

products = [
        Peru(
            departamento=tmp_data.ix[row-1]['departamento'],
            provincia=tmp_data.ix[row-1]['provincia'],
            distrito=tmp_data.ix[row-1]['distrito'],
        )
        for row in tmp_data['id']
    ]

或者您可以像库建议的那样遍历数据框:

products = []
for i, row in tmp_data.iterrows():
    products.append(Peru(
        departamento=row]['departamento'],
        provincia=row['provincia'],
        distrito=row['distrito'],
    ))

Peru.objects.bulk_create(products)

答案 1 :(得分:0)

id字段看起来像一个索引,但从1开始;当您创建行时,可以通过使用id字段作为索引来按索引访问数据帧,当您尝试访问第1868行(不存在)时会产生错误。 我会尝试:

import pandas as pd
import csv
from shop.models import Peru
from django.core.management.base import BaseCommand


tmp_data=pd.read_csv('static/data/ubigeo-peru-2018-12-25.csv',sep=',', encoding="utf-8")


class Command(BaseCommand):
    def handle(self, **options):
        products = [
            Peru(
                departamento=row['departamento'],
                provincia=row['provincia'],
                distrito=row['distrito'],
            )
            for index, row in tmp_data.iterrows()
        ]

        Peru.objects.bulk_create(products)