大熊猫 - 分类数据并获得2列

时间:2016-11-10 19:22:49

标签: python pandas

我有一个非常简单的数据框。有2列,day_created(int,可以更改为datetime)和suspened(int,可以更改为boolean)。我可以更改数据,如果它更容易使用。

       Day created  Suspended
0               12          0
1                6          1
2               24          0
3                8          0
4              100          1
5               30          0
6                1          1
7                6          0

day_created列是创建帐户当天的整数(从开始日期开始),从1开始增加。悬浮柱为1悬浮液,0为无悬浮液。

我想要做的是将这些帐户分成30天或几个月的组,但是从每个bin中获取该月的帐户总数以及在该月创建的帐户被暂停的数量。然后我计划每个月创建一个条形图,其中包含2个条形图。

我应该怎么做?我不经常使用大熊猫。我想我需要做一些重新抽样和计数的技巧。

1 个答案:

答案 0 :(得分:1)

使用

return "#{@@result.to_json}"

为DataFrame提供表示帐户创建时间的时间戳索引。

然后你可以使用

# require 'em-http-request'

class WaitForJob
  def self.job(current_build_json, last_build_json, start_job)
    job_result_hash = Hash.new{|hsh,key| hsh[key] = {} } # Initialize a Hash for storing the results
    start_job.send_request # Start the Jenkins Job
    get_current_build_number = CheckJSON.get_from_json("#{current_build_json.send_request}", 'nextBuildNumber') # Fetch the nextBuildNumber as soon as the job starts (as that doesn't increment while it's in queue); the nextBuildNumber is going to be the currentBuildNumber
    current_build_number = get_current_build_number.to_i # Save that nextBuildNumber to a separate variable for comparison
    get_last_build_number = CheckJSON.get_from_json("#{last_build_json.send_request}", 'number')
    get_last_build_duration = CheckJSON.get_from_json("#{last_build_json.send_request}", 'duration')
    get_last_build_result = nil
    loop do
      Timeout::timeout(120) do
        # EM.run do
          sleep(5) # DEBUG
          get_last_build_number = CheckJSON.get_from_json("#{last_build_json.send_request}", 'number')
          get_last_build_result = CheckJSON.get_from_json("#{last_build_json.send_request}", 'result')
          get_last_build_duration = CheckJSON.get_from_json("#{last_build_json.send_request}", 'duration')

          # conn = EM::HttpRequest.new('http://localhost:9000/')
          # start = Time.now

          # r1 = conn.get :query => {delay: 1.5}, :keepalive => true
          # r2 = conn.get :query => {delay: 1.0}

          # r2.callback do
          #   p Time.now - start # =>  1.5 - keep-alive + pipelining
          #   EM.stop
          # end
        # end
      end
      break if !get_last_build_result.nil? && !get_last_build_duration.zero? && (current_build_number == get_last_build_number) # End the loop when job is done
    end

    job_name = "#{CheckJSON.get_from_json("#{last_build_json.send_request}", 'fullDisplayName')}" # Fetch job's name
    job_name = job_name.split(/ |\./) # Splits the job_name using '.' and ' ' as delimiters

    job_result_hash['job_type'] = "#{job_name[3]}" # This takes the last part of the jenkins job name (Ex: Dev.eng-paas.devtools.TESTING_INTEGRATION_JOB)
    job_result_hash['build_number'] = "#{current_build_number}" # Return the build number also which can be used in different situations
    job_result_hash['job_duration'] = "#{CheckJSON.get_from_json("#{last_build_json.send_request}", 'duration')}" # Fetches the duration of job
    job_result_hash['job_result'] = "#{CheckJSON.get_from_json("#{last_build_json.send_request}", 'result').downcase}" # Fetches if the job was successful/unstable/failure
    return job_result_hash
  end
end

根据索引中的时间戳对DataFrame的行(按月)进行分组。 df.index = start_date + pd.to_timedelta(df['Day created'], unit='D') 计算帐户数量(计数)和每个群组的已暂停帐户数量。

然后result = df.groupby(pd.TimeGrouper(freq='M')).agg(['count', 'sum']) 绘制条形图:

.agg(['count', 'sum'])

的产率 this