Question

我需要根据CSV文件创建表格。

我认为我可以使用不同的库来完成此操作，但是在这种情况下，我选择使用pandas，因为在不久的将来我将需要它来进行某些数据分析。

我有一个脚本，但出现此错误：

Traceback (most recent call last):
  File "/home/gonzales/Escritorio/virtual_envs/stickers_gallito_env/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3078, in get_loc
    return self._engine.get_loc(key)
  File "pandas/_libs/index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 958, in pandas._libs.hashtable.Int64HashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 964, in pandas._libs.hashtable.Int64HashTable.get_item
KeyError: 1867

Dropbox中的数据：

https://www.dropbox.com/s/o3iga509qi8suu9/ubigeo-peru-2018-12-25.csv?dl=0

脚本：

import pandas as pd
import csv
from shop.models import Peru
from django.core.management.base import BaseCommand


tmp_data=pd.read_csv('static/data/ubigeo-peru-2018-12-25.csv',sep=',', encoding="utf-8")


class Command(BaseCommand):
    def handle(self, **options):
        products = [
            Peru(
                departamento=tmp_data.ix[row]['departamento'],
                provincia=tmp_data.ix[row]['provincia'],
                distrito=tmp_data.ix[row]['distrito'],
            )
            for row in tmp_data['id']
        ]

        Peru.objects.bulk_create(products)

models.py

class Peru(models.Model):
    departamento = models.CharField(max_length=100, blank=False)
    provincia = models.CharField(max_length=100, blank=False)
    distrito = models.CharField(max_length=100, blank=False)

    def __str__(self):
        return self.departamento

Answer 1

此方法不起作用（并向最后一个对象引发错误）的原因是row实际上是数据的id，当您将其用作索引。

改为使用它：

products = [
        Peru(
            departamento=tmp_data.ix[row-1]['departamento'],
            provincia=tmp_data.ix[row-1]['provincia'],
            distrito=tmp_data.ix[row-1]['distrito'],
        )
        for row in tmp_data['id']
    ]

或者您可以像库建议的那样遍历数据框：

products = []
for i, row in tmp_data.iterrows():
    products.append(Peru(
        departamento=row]['departamento'],
        provincia=row['provincia'],
        distrito=row['distrito'],
    ))

Peru.objects.bulk_create(products)

Answer 2

id字段看起来像一个索引，但从1开始；当您创建行时，可以通过使用id字段作为索引来按索引访问数据帧，当您尝试访问第1868行（不存在）时会产生错误。我会尝试：

import pandas as pd
import csv
from shop.models import Peru
from django.core.management.base import BaseCommand


tmp_data=pd.read_csv('static/data/ubigeo-peru-2018-12-25.csv',sep=',', encoding="utf-8")


class Command(BaseCommand):
    def handle(self, **options):
        products = [
            Peru(
                departamento=row['departamento'],
                provincia=row['provincia'],
                distrito=row['distrito'],
            )
            for index, row in tmp_data.iterrows()
        ]

        Peru.objects.bulk_create(products)

Django：从CSV创建数据库表时出现KeyError

2 个答案: