Question

这篇文章更多的是关于问题的编码方法而不是问题本身（对于改变！）。

我有很多项目正在处理，需要我从一些不同的来源收集销售数据。

每个供应商的数据几乎总是以不同的方式访问和构建;最好的情况是一个很好的有效JSON响应，最坏的情况是我屏幕刮掉数据。

由于每个供应商的源数据如此不同，我决定通过它的API将json数据提供给主销售数据应用程序的专用Rails-api应用程序是前进的方向。我确实为每个供应商应用程序使用Sinatra，但我的知识是使用Rails，所以我可以更快地完成工作。我认为每个供应商应用程序的专用应用程序是正确的方法，因为这些可以独立维护，如果供应商决定自己开始提供他们的数据，我（或其他开发人员）可以轻松地交换东西，而无需深入研究一个巨大的monothlic但是，如果您认为专用应用程序没有多大意义，请说销售数据收集应用程序。

因此，作为一个简单的简化示例，每个供应商应用程序都是围绕这样的类构建的。目前我只是通过控制台调用方法，但最终会通过rake任务和后台工作者自动完成。

class VendorA < ActiveRecord::Base

  def self.get_report
    # Uses Mechanize Gem to fetch a report in CSV format
    # returns report
  end

  def self.save_report(report)
    # Takes the report from the get_report method and saves it, currently to the app root but eventually this will be S3
    # returns the local_report
  end

  def self.convert_report_to_json(local_report)        
    # Reads the local report, iterates through it grabbing the fields required for the master sales-data app and constructs a JSON repsonse
    # returns a valid JSON repsonse called json_data
  end

  def self.send_to_master_sales_api(json_data)
    # As you can see here I take the repsonse from convert_report_to_json and post it to my sales data App
    require 'rest-client'
    RestClient.post('http://localhost:3000/api/v1/sales/batch', json_data)
  end

end

这适用于send_to_master_sales_api方法执行预期的操作。我还没有测试过超过大约1000个数据对象/行。

在接收端，在主销售数据应用程序中，事情如下所示：

module Api
  module V1
    class SalesController < ApplicationController

      before_filter :restrict_access
      skip_before_filter  :verify_authenticity_token
      respond_to :json

      def batch
        i = 0
        clientid = params[:clientid]
        objArray = JSON.parse(params[:sales])
        objArray.each do |sale|
          sale = Sale.new(sale)
          sale.clientid =  clientid   #Required / Not null
          sale.save
          i +=1
        end 
        render json: {message: "#{i} Sales lines recevied"}, status: 202
      end


      private
      def restrict_access
        api_key = ApiKey.find_by_api_key_and_api_secret(params[:api_key],params[:api_secret])
        render json: { message: 'Invalid Key - Access Restricted' }, status: :unauthorized unless api_key
      end

    end
  end
end

所以，我的主要问题是你认为这种方法可以处理的JSON数据量。正如我上面提到的，我已经测试了大约1000行/对象，它工作正常。但是，我是否应该开始使用其中一个供应商应用程序来处理数据，我接近每个来源，每天10,000,000,000甚至1,000,000行/对象，我应该考虑上述两个应用程序能够应付。是否有像find_in_batches这样的东西可以用来减轻接收数据的负担？目前我计划将主销售数据记录写入Postgres数据库。虽然我对NoSQL的经验有限，但是会在接收端写入MongoDB或类似的速度吗？

我意识到这不是一个特别直接的问题，但我真的会对那些有这类经验的人提出意见和建议。

提前致谢！

通过POST批量JSON数据到API

0 个答案: