我需要根据CSV文件创建表格。
我认为我可以使用不同的库来完成此操作,但是在这种情况下,我选择使用pandas
,因为在不久的将来我将需要它来进行某些数据分析。
我有一个脚本,但出现此错误:
Traceback (most recent call last):
File "/home/gonzales/Escritorio/virtual_envs/stickers_gallito_env/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3078, in get_loc
return self._engine.get_loc(key)
File "pandas/_libs/index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 958, in pandas._libs.hashtable.Int64HashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 964, in pandas._libs.hashtable.Int64HashTable.get_item
KeyError: 1867
Dropbox中的数据:
https://www.dropbox.com/s/o3iga509qi8suu9/ubigeo-peru-2018-12-25.csv?dl=0
脚本:
import pandas as pd
import csv
from shop.models import Peru
from django.core.management.base import BaseCommand
tmp_data=pd.read_csv('static/data/ubigeo-peru-2018-12-25.csv',sep=',', encoding="utf-8")
class Command(BaseCommand):
def handle(self, **options):
products = [
Peru(
departamento=tmp_data.ix[row]['departamento'],
provincia=tmp_data.ix[row]['provincia'],
distrito=tmp_data.ix[row]['distrito'],
)
for row in tmp_data['id']
]
Peru.objects.bulk_create(products)
models.py
class Peru(models.Model):
departamento = models.CharField(max_length=100, blank=False)
provincia = models.CharField(max_length=100, blank=False)
distrito = models.CharField(max_length=100, blank=False)
def __str__(self):
return self.departamento
答案 0 :(得分:1)
此方法不起作用(并向最后一个对象引发错误)的原因是row
实际上是数据的id
,当您将其用作索引。
改为使用它:
products = [
Peru(
departamento=tmp_data.ix[row-1]['departamento'],
provincia=tmp_data.ix[row-1]['provincia'],
distrito=tmp_data.ix[row-1]['distrito'],
)
for row in tmp_data['id']
]
或者您可以像库建议的那样遍历数据框:
products = []
for i, row in tmp_data.iterrows():
products.append(Peru(
departamento=row]['departamento'],
provincia=row['provincia'],
distrito=row['distrito'],
))
Peru.objects.bulk_create(products)
答案 1 :(得分:0)
id
字段看起来像一个索引,但从1开始;当您创建行时,可以通过使用id
字段作为索引来按索引访问数据帧,当您尝试访问第1868行(不存在)时会产生错误。
我会尝试:
import pandas as pd
import csv
from shop.models import Peru
from django.core.management.base import BaseCommand
tmp_data=pd.read_csv('static/data/ubigeo-peru-2018-12-25.csv',sep=',', encoding="utf-8")
class Command(BaseCommand):
def handle(self, **options):
products = [
Peru(
departamento=row['departamento'],
provincia=row['provincia'],
distrito=row['distrito'],
)
for index, row in tmp_data.iterrows()
]
Peru.objects.bulk_create(products)