Question

我的计划是分批从网站收集律师传记数据，并将每个批次转换为.csv文件，然后转换为json，然后将每个批次加载到Django数据库中。

请让我知道如何以最佳方式完成此任务。

Answer 1

直接加载数据库。批量收集网站数据，直接加载SQlite3。只需编写使用Django ORM的简单批处理应用程序。从网站收集数据并立即加载SQLite3。不要创建CSV。不要创建JSON。不要创建中间结果。不要做任何额外的工作。

编辑。

from myapp.models import MyModel
import urllib2

with open("sourceListOfURLs.txt", "r" ) as source:
    for aLine in source:
        for this, the, the_other in someGenerator( aLine ):
            object= MyModel.objects.create( field1=this, field2=that, field3=the_other )
            object.save()

def someGenerator( url ):
    # open the URL with urllib2
    # parse the data with BeautifulSoup
    yield this, that, the_other

如何填充Django sqlite3数据库

1 个答案: