将列表折叠为具有一系列日期的唯一ID

时间:2016-07-25 18:48:07

标签: python pandas dataframe

我有一大堆ID,它们以不同的日期范围重复。我需要创建一个唯一的ID列表,其中只包含一个日期范围,其中包括未拆分列表中最早的开始日期和最晚结束日期。

这是我所拥有的一个例子:

var latlng = new google.maps.LatLng(0, 0);
    var myOptions = {
        zoom: 16,
        center: latlng,
        scrollwheel: false,
        disableDefaultUI: true,
        draggable: false,
        keyboardShortcuts: false,
        disableDoubleClickZoom: false,
        noClear: true,
        scaleControl: false,
        panControl: false,
        streetViewControl: false,
        styles: [{"featureType":"landscape","stylers":[{"hue":"#FFBB00"},{"saturation":43.400000000000006},{"lightness":37.599999999999994},{"gamma":1}]},{"featureType":"road.highway","stylers":[{"hue":"#FFC200"},{"saturation":-61.8},{"lightness":45.599999999999994},{"gamma":1}]},{"featureType":"road.arterial","stylers":[{"hue":"#FF0300"},{"saturation":-100},{"lightness":51.19999999999999},{"gamma":1}]},{"featureType":"road.local","stylers":[{"hue":"#FF0300"},{"saturation":-100},{"lightness":52},{"gamma":1}]},{"featureType":"water","stylers":[{"hue":"#0078FF"},{"saturation":-13.200000000000003},{"lightness":2.4000000000000057},{"gamma":1}]},{"featureType":"poi","stylers":[{"hue":"#00FF6A"},{"saturation":-1.0989010989011234},{"lightness":11.200000000000017},{"gamma":1}]}],
        mapTypeId: google.maps.MapTypeId.ROADMAP
    };
    var map = new google.maps.Map(document.getElementById("map"), myOptions);
    var geocoder_map = new google.maps.Geocoder();
    var address = '11681 King Fahd Road, Al Mohammadiyah, 4047, Riyadh, Riyadh Province Saudi Arabia';
    geocoder_map.geocode({
        'address': address
    }, function(results, status) {
        if (status == google.maps.GeocoderStatus.OK) {
            map.setCenter(results[0].geometry.location);
            var image = "../wp-content/themes/rawafid-systems/assets/img/pin.svg";
            var marker = new google.maps.Marker({
                map: map,
                icon: image,
                position: map.getCenter()
            });
            var contentString = 'Tagline';
            var infowindow = new google.maps.InfoWindow({
                content: contentString
            });
            google.maps.event.addListener(marker, 'click', function() {
                infowindow.open(map, marker);
            });
        } else {
            alert("Geocode was not successful for the following reason: " + status);
        }
    });

这就是我需要的。

    id  start_date  end_date
    1   9/25/2015   10/12/2015
    1   9/16/2015   11/1/2015
    1   8/25/2015   9/21/2015
    2   9/2/2015    10/29/2015
    3   9/18/2015   10/15/2015
    3   9/19/2015   9/30/2015
    4   8/27/2015   9/15/2015

我试图在Python中使用它,但没有太多运气。谢谢!

1 个答案:

答案 0 :(得分:2)

使用groupby/aggregate

In [12]: df.groupby('id').agg({'start_date':min, 'end_date':max})
Out[12]: 
   start_date   end_date
id                      
1  2015-08-25 2015-11-01
2  2015-09-02 2015-10-29
3  2015-09-18 2015-10-15
4  2015-08-27 2015-09-15

请注意,将start_dateend_date解析为日期非常重要,以便minmax返回最小和最大日期每个id的s。如果值只是日期的字符串表示,那么minmax将给出字符串 min或max,这取决于字符串字典顺序。如果日期字符串采用YYYY/MM/DD格式,则字典顺序将对应于解析日期顺序,但MM/DD/YYYY格式的日期字符串不具有此属性。

如果start_dateend_date有字符串值,那么

for col in ['start_date', 'end_date']:
    df[col] = pd.to_datetime(df[col])

会将字符串转换为日期。

如果您使用pd.read_table(或pd.read_csv)从文件加载DataFrame,那么

df = pd.read_table(filename, ..., parse_dates=[1, 2])

会将文件的第二列和第三列中的字符串解析为日期。 [1, 2]对应于第二列和第三列,因为Python使用基于0的索引。