使用Pandas转换表

时间:2016-09-25 08:07:45

标签: python-3.x pandas

对Pandas的经验不多,搜索了线程但找不到类似的东西。

我有一个大的1mil记录表,其结构如下,

+-------+-------+-------------+-----------+
|  rec  | code  | code group  | code Date |
+-------+-------+-------------+-----------+
| 10001 | X11   | High        | 20151105  |
| 10001 | X11.1 | High        | 20150205  |
| 10001 | X12   | Medium      | 20141111  |
| 10001 | X12.1 | Medium      | 20141111  |
| 10001 | X13   | Low         | 20130101  |
| 10001 | Y15   | No Interest | 20130101  |
| 10001 | Y16   | No Interest | 20141231  |
| 10002 | X11   | …           | …         |
| 10002 | X12   | …           | …         |
| 10002 | X13   | …           | …         |
+-------+-------+-------------+-----------+

并希望将它结构到表中,只有唯一的rec将驻留并采用以下格式:

部首: rec |高(最大日期)|中(最大日期)|低(最大日期)|代码(仅限日期最长的H)|高代码(计数)

1 个答案:

答案 0 :(得分:0)

以下是一些线索。

<template>
  <quasar-layout>
  <h3>Map</h3>
  <div id='map'></div>
  </quasar-layout>
</template>

<script>
import mapboxgl from 'mapbox-gl'
console.dir(mapboxgl)

export default {
  data () {
    return {}
  },
  ready () {
    this.createMap()
  },
  methods: {
    createMap: function () {
      mapboxgl.accessToken = '{{yourmapboxaccestokenkey}}'
      var simple = {
        'version': 8,
        'sources': {
          'osm': {
            'type': 'vector',
            'tiles': ['https://vector.mapzen.com/osm/all/{z}/{x}/{y}.mvt?api_key=vector-tiles-{{yourmapzenapikey}}']
          }
        },
        'layers': [{
          'id': 'background',
          'type': 'background',
          'paint': {
            'background-color': '#bbccd2'
          }
        },
          {
            'id': 'majorroad',
            'source': 'osm',
            'source-layer': 'roads',
            'type': 'line'
          },
          {
            'id': 'buildings',
            'type': 'fill',
            'source': 'osm',
            'source-layer': 'buildings'
          }]
      }

      // init the map
      this.map = new mapboxgl.Map({
        container: 'map',
        style: simple,
        minzoom: 1.3,
        center: [-74.0073, 40.7124], // Manhattan
        zoom: 16
      })

      this.map.addControl(new mapboxgl.Navigation())
    }
  }
}
</script>

<style>
</style>

如何获得“rec,High(最大日期),Medium(最大日期),Low(最大日期)”

# Test data
df = DataFrame({'rec': [10001, 10001, 10002, 10002],
 'code': ['X11', 'X12', 'X11.1', 'X12'],
 'code group': ['High', 'High', 'High', 'Medium'],
 'code Date': ['20151105', '20141111', '20151004', '20151004']
}, columns = ['rec', 'code', 'code group', 'code Date'])

# Converting dates
df['code Date'] = pd.to_datetime(df['code Date'])

#      rec   code code group  code Date
# 0  10001    X11       High 2015-11-05
# 1  10001    X12       High 2014-11-11
# 2  10002  X11.1       High 2015-10-04
# 3  10002    X12     Medium 2015-10-04

这是如何获取具有最高日期和行数的代码。

pivot = pd.pivot_table(df, 
               index = 'rec', 
               columns='code group', 
               values='code Date', 
               aggfunc='max')

# code group       High     Medium
# rec                             
# 10001      2015-11-05        NaT
# 10002      2015-10-04 2015-10-04

汇总数据以获得最终结果。

# Filterting and sorting the values in order to have last dates first
filt = df[df['code group'] == 'High'].sort_values(['rec', 'code Date'], ascending=[True, False])
# Keeping the first value for code (the one with last date), and counting the rows
filt = filt.groupby('rec').agg({'code': 'first', 'code Date': 'size'})

#        code Date   code
# rec                    
# 10001          2    X11
# 10002          1  X11.1