选择多个分组后的时间上的最后一个值

时间:2019-05-18 03:09:25

标签: pandas

我想先将“名称”分组,然后按“天”进行汇总,然后每天选择每个“名称”的最后一个值。

我从这里有了一些想法:pandas - how to organised dataframe based on date and assign new values to column

我尝试了此操作,但无法成功。有什么好办法吗?

df = df.groupby(df['name']).resample('D',on='Timestamp').apply(['last'])

例如:

import pandas as pd

N = 9
rng = pd.date_range('2011-01-01', periods=N, freq='15S')
df = pd.DataFrame({'Timestamp': rng, 'name': ['A','A', 'B','B','B','B','C','C','C'],
                  'value': [1, 2, 3, 2, 3, 1, 3, 4, 3],'Temp': range(N)}) 
[out]:
    Timestamp           name    value   Temp
0   2011-01-01 00:00:00   A     1       0
1   2011-01-01 00:00:15   A     2       1
2   2011-01-01 00:00:30   B     3       2
3   2011-01-01 00:00:45   B     2       3
4   2011-01-01 00:01:00   B     3       4
5   2011-01-01 00:01:15   B     1       5
6   2011-01-01 00:01:30   C     3       6
7   2011-01-01 00:01:45   C     4       7
8   2011-01-01 00:02:00   C     3       8

我想得到这些:

[out]:
           Timestamp    name    value   Temp
1   2011-01-01 00:00:15   A     2       1
5   2011-01-01 00:01:15   B     1       5
8   2011-01-01 00:02:00   C     3       8

2 个答案:

答案 0 :(得分:2)

IIUC

insert

import java.util.List;
import java.util.Map;
import java.util.function.Function;
import java.util.stream.Collectors;

public class ParentErasure {

    public abstract class AbstractDao<T extends DatabaseTable, R extends Record> {
        private Connection connection;
        private Map<Class, Function> mappers = new HashMap<>();

        public <M> void registerMapper(Class<M> mappingClass, Function<M, R> mapper) {
            mappers.put(mappingClass, mapper);
        }

        public <M> List<M> insert(List<M> records) {
            if (records.isEmpty()) return records;
            M rec = records.get(0);

            List<? extends Record> actualRecords = (rec instanceof Record) ? 
                    (List<Record>)records : createMappedRecords(records, rec.getClass());

            connection.insertBulk(actualRecords);
            return records;
        }

        private <M> List<R> createMappedRecords(List<M> records, Class<? extends Object> recordsClazz) {
            Function<M, R> mapper = mappers.get(recordsClazz);
            return records.stream()
                    .map(mapper::apply)
                    .collect(Collectors.toList());
        }
    }

    public interface Dao<T> {
        public List<T> insert(List<T> records);
    }
}

class Record {}
class DatabaseTable {}
class DatabaseRecord {}
class Connection {
    public void insertBulk(List<? extends Record> records) {}
}

答案 1 :(得分:1)

如果需要每天和每列name的最后一个值,请将GroupBy.tailGrouper一起使用:

df1 = df.groupby([pd.Grouper(freq='D', key='Timestamp'), 'name']).tail(1)
print (df1)
            Timestamp name  value  Temp
1 2011-01-01 00:00:15    A      2     1
5 2011-01-01 00:01:15    B      1     5
8 2011-01-01 00:02:00    C      3     8

或将Timestamp的值转换为Series.dt.date的日期:

df2 = df.groupby([df['Timestamp'].dt.date, 'name']).tail(1)
print (df2)
            Timestamp name  value  Temp
1 2011-01-01 00:00:15    A      2     1
5 2011-01-01 00:01:15    B      1     5
8 2011-01-01 00:02:00    C      3     8

Series.dt.normalize还有其他选择:

df2 = df.groupby([df['Timestamp'].dt.normalize(), 'name']).tail(1)

Series.dt.floor

df2 = df.groupby([df['Timestamp'].dt.floor('D'), 'name']).tail(1)