我正在尝试将Pandas DataFrame中的数据插入到使用SQLite后端的现有Django模型Agency
中。但是,按照How to write a Pandas Dataframe to Django model和Saving a Pandas DataFrame to a Django Model上的答案会导致整个SQLite表被替换并破坏Django代码。具体来说,Django自动生成的id
主键列被index
替换,导致渲染模板时出错(no such column: agency.id
)。
以下是在SQLite表agency
上使用Pandas to_sql的代码和结果。
在models.py
:
class Agency(models.Model):
name = models.CharField(max_length=128)
在myapp/management/commands/populate.py
:
class Command(BaseCommand):
def handle(self, *args, **options):
# Open ModelConnection
from django.conf import settings
database_name = settings.DATABASES['default']['NAME']
database_url = 'sqlite:///{}'.format(database_name)
engine = create_engine(database_url, echo=False)
# Insert data data
agencies = pd.DataFrame({"name": ["Agency 1", "Agency 2", "Agency 3"]})
agencies.to_sql("agency", con=engine, if_exists="replace")
调用“python manage.py populate
”会成功将三个代理商添加到表格中:
index name
0 Agency 1
1 Agency 2
2 Agency 3
但是,这样做会改变表格的DDL:
CREATE TABLE "agency" ("id" integer NOT NULL PRIMARY KEY AUTOINCREMENT, "name" varchar(128) NOT NULL)
为:
CREATE TABLE agency (
"index" BIGINT,
name TEXT
);
CREATE INDEX ix_agency_index ON agency ("index")
如何将DataFrame添加到Django管理的模型中并保持Django ORM不变?
答案 0 :(得分:5)
回答我自己的问题,因为我现在经常使用Pandas将数据导入Django,我所犯的错误是试图使用Pandas内置的Sql Alchemy DB ORM来修改底层数据库表定义。在上面的上下文中,您可以简单地使用Django ORM来连接和插入数据:
from myapp.models import Agency
class Command(BaseCommand):
def handle(self, *args, **options):
# Process data with Pandas
agencies = pd.DataFrame({"name": ["Agency 1", "Agency 2", "Agency 3"]})
# iterate over DataFrame and create your objects
for agency in agencies.itertuples():
agency = Agency.objects.create(name=agency.name)
但是,您可能经常希望使用外部脚本导入数据,而不是使用管理命令(如上所述)或使用Django的shell。在这种情况下,您必须首先通过调用setup
方法连接到Django ORM:
import os, sys
import django
import pandas as pd
sys.path.append('../..') # add path to project root dir
os.environ["DJANGO_SETTINGS_MODULE"] = "myproject.settings"
# for more sophisticated setups, if you need to change connection settings (e.g. when using django-environ):
#os.environ["DATABASE_URL"] = "postgres://myuser:mypassword@localhost:54324/mydb"
# Connect to Django ORM
django.setup()
# process data
from myapp.models import Agency
Agency.objects.create(name='MyAgency')
我已将设置模块myproject.settings
导出到DJANGO_SETTINGS_MODULE
,以便django.setup()
可以选择项目设置。
根据您运行脚本的位置,您可能需要路径到系统路径,以便Django可以找到设置模块。在这种情况下,我在我的项目根目录下运行我的脚本两个目录。
您可以在致电setup
之前修改任何设置。如果您的脚本需要以与settings
中配置的内容不同的方式连接到数据库。例如,在本地针对Django / postgres Docker容器运行脚本时。
注意,上面的示例使用django-environ指定数据库设置。
答案 1 :(得分:2)
对于那些寻求更高性能和最新解决方案的人,我建议使用AVMutableComposition* mixComposition = [AVMutableComposition composition];
NSURL *audioPath = [[NSBundle mainBundle] URLForResource:@"sound" withExtension:@"mp3"];
AVURLAsset* audioAsset = [[AVURLAsset alloc]initWithURL:audioPath options:nil];
AVURLAsset* videoAsset = [[AVURLAsset alloc]initWithURL:self.videoUrl options:nil];
AVAssetTrack *assetVideoTrack = [videoAsset tracksWithMediaType:AVMediaTypeVideo].lastObject;
// add video
AVMutableCompositionTrack *compositionVideoTrack = [mixComposition addMutableTrackWithMediaType:AVMediaTypeVideo
preferredTrackID:kCMPersistentTrackID_Invalid];
[compositionVideoTrack insertTimeRange:CMTimeRangeMake(kCMTimeZero, videoAsset.duration)
ofTrack:[[videoAsset tracksWithMediaType:AVMediaTypeVideo] objectAtIndex:0]
atTime:kCMTimeZero error:nil];
[compositionVideoTrack setPreferredTransform:assetVideoTrack.preferredTransform];
// add video audio
AVMutableCompositionTrack *videoSoundTrack = [mixComposition addMutableTrackWithMediaType:AVMediaTypeAudio
preferredTrackID:kCMPersistentTrackID_Invalid];
[videoSoundTrack insertTimeRange:CMTimeRangeMake(kCMTimeZero, videoAsset.duration)
ofTrack:[[videoAsset tracksWithMediaType:AVMediaTypeAudio] objectAtIndex:0]
atTime:kCMTimeZero error:nil];
// add sound
AVMutableCompositionTrack *compositionCommentaryTrack = [mixComposition addMutableTrackWithMediaType:AVMediaTypeAudio
preferredTrackID:kCMPersistentTrackID_Invalid];
[compositionCommentaryTrack insertTimeRange:CMTimeRangeMake(kCMTimeZero, audioAsset.duration)
ofTrack:[[audioAsset tracksWithMediaType:AVMediaTypeAudio] objectAtIndex:0]
atTime:self.avPlayer.currentTime error:nil];
CGSize sizeOfVideo = [compositionVideoTrack naturalSize];
AVAssetExportSession* _assetExport = [[AVAssetExportSession alloc] initWithAsset:mixComposition
presetName:AVAssetExportPresetPassthrough];
NSArray *dirPaths = NSSearchPathForDirectoriesInDomains(NSDocumentDirectory, NSUserDomainMask, YES);
NSString *docsDir = [dirPaths objectAtIndex:0];
NSString *savePath = [docsDir stringByAppendingPathComponent:@"video.mov"];
NSURL *savetUrl = [NSURL fileURLWithPath:savePath];
if ([[NSFileManager defaultManager] fileExistsAtPath:savePath]) {
[[NSFileManager defaultManager] removeItemAtPath:savePath error:nil];
[[NSFileManager defaultManager] removeItemAtURL:savetUrl error:nil];
}
_assetExport.outputFileType = @"com.apple.quicktime-movie";
_assetExport.outputURL = savetUrl;
_assetExport.shouldOptimizeForNetworkUse = YES;
[_assetExport exportAsynchronouslyWithCompletionHandler:
^(void ) {
dispatch_async(dispatch_get_main_queue(), ^{
[MBProgressHUD hideHUDForView:self.view animated:YES];
});
switch (_assetExport.status)
{
case AVAssetExportSessionStatusFailed:
{
NSLog (@"FAIL %@",_assetExport.error);
break;
}
case AVAssetExportSessionStatusCompleted:
{
dispatch_async(dispatch_get_main_queue(), ^{
// work with the video
});
break;
}
case AVAssetExportSessionStatusCancelled:
{
NSLog (@"CANCELED");
break;
}
}
NSLog(@"Export Status %d-- %@", _assetExport.status, _assetExport.outputURL);
}
];
并实例化django模型实例,但不创建它们。
manager.bulk_create
请注意,model_instances = [Agency(name=agency.name) for agency in agencies.itertuples()]
Agency.objects.bulk_create(model_instances)
不会运行信号或自定义保存,因此,如果您有bulk_create
模型的自定义保存逻辑或信号挂钩,则不会触发该操作。以下是警告的完整列表。
文档:https://docs.djangoproject.com/en/3.0/ref/models/querysets/#bulk-create
答案 2 :(得分:1)
在itertuples中存在语法错误,它缺少圆括号。
应该是
for agency in agencies.itertuples():
agency = Agency.objects.create(name=agency.name)
感谢您分享您的回答。
参考pandas 0.22.0文档,Link to pandas.DataFrame.itertuples