我有一个大的1mil记录表,其结构如下,
+-------+-------+-------------+-----------+
| rec | code | code group | code Date |
+-------+-------+-------------+-----------+
| 10001 | X11 | High | 20151105 |
| 10001 | X11.1 | High | 20150205 |
| 10001 | X12 | Medium | 20141111 |
| 10001 | X12.1 | Medium | 20141111 |
| 10001 | X13 | Low | 20130101 |
| 10001 | Y15 | No Interest | 20130101 |
| 10001 | Y16 | No Interest | 20141231 |
| 10002 | X11 | … | … |
| 10002 | X12 | … | … |
| 10002 | X13 | … | … |
+-------+-------+-------------+-----------+
并希望将它结构到表中,只有唯一的rec将驻留并采用以下格式:
部首: rec |高(最大日期)|中(最大日期)|低(最大日期)|代码(仅限日期最长的H)|高代码(计数)
答案 0 :(得分:0)
以下是一些线索。
<template>
<quasar-layout>
<h3>Map</h3>
<div id='map'></div>
</quasar-layout>
</template>
<script>
import mapboxgl from 'mapbox-gl'
console.dir(mapboxgl)
export default {
data () {
return {}
},
ready () {
this.createMap()
},
methods: {
createMap: function () {
mapboxgl.accessToken = '{{yourmapboxaccestokenkey}}'
var simple = {
'version': 8,
'sources': {
'osm': {
'type': 'vector',
'tiles': ['https://vector.mapzen.com/osm/all/{z}/{x}/{y}.mvt?api_key=vector-tiles-{{yourmapzenapikey}}']
}
},
'layers': [{
'id': 'background',
'type': 'background',
'paint': {
'background-color': '#bbccd2'
}
},
{
'id': 'majorroad',
'source': 'osm',
'source-layer': 'roads',
'type': 'line'
},
{
'id': 'buildings',
'type': 'fill',
'source': 'osm',
'source-layer': 'buildings'
}]
}
// init the map
this.map = new mapboxgl.Map({
container: 'map',
style: simple,
minzoom: 1.3,
center: [-74.0073, 40.7124], // Manhattan
zoom: 16
})
this.map.addControl(new mapboxgl.Navigation())
}
}
}
</script>
<style>
</style>
如何获得“rec,High(最大日期),Medium(最大日期),Low(最大日期)”
# Test data
df = DataFrame({'rec': [10001, 10001, 10002, 10002],
'code': ['X11', 'X12', 'X11.1', 'X12'],
'code group': ['High', 'High', 'High', 'Medium'],
'code Date': ['20151105', '20141111', '20151004', '20151004']
}, columns = ['rec', 'code', 'code group', 'code Date'])
# Converting dates
df['code Date'] = pd.to_datetime(df['code Date'])
# rec code code group code Date
# 0 10001 X11 High 2015-11-05
# 1 10001 X12 High 2014-11-11
# 2 10002 X11.1 High 2015-10-04
# 3 10002 X12 Medium 2015-10-04
这是如何获取具有最高日期和行数的代码。
pivot = pd.pivot_table(df,
index = 'rec',
columns='code group',
values='code Date',
aggfunc='max')
# code group High Medium
# rec
# 10001 2015-11-05 NaT
# 10002 2015-10-04 2015-10-04
汇总数据以获得最终结果。
# Filterting and sorting the values in order to have last dates first
filt = df[df['code group'] == 'High'].sort_values(['rec', 'code Date'], ascending=[True, False])
# Keeping the first value for code (the one with last date), and counting the rows
filt = filt.groupby('rec').agg({'code': 'first', 'code Date': 'size'})
# code Date code
# rec
# 10001 2 X11
# 10002 1 X11.1