我想先将“名称”分组,然后按“天”进行汇总,然后每天选择每个“名称”的最后一个值。
我从这里有了一些想法:pandas - how to organised dataframe based on date and assign new values to column
我尝试了此操作,但无法成功。有什么好办法吗?
df = df.groupby(df['name']).resample('D',on='Timestamp').apply(['last'])
例如:
import pandas as pd
N = 9
rng = pd.date_range('2011-01-01', periods=N, freq='15S')
df = pd.DataFrame({'Timestamp': rng, 'name': ['A','A', 'B','B','B','B','C','C','C'],
'value': [1, 2, 3, 2, 3, 1, 3, 4, 3],'Temp': range(N)})
[out]:
Timestamp name value Temp
0 2011-01-01 00:00:00 A 1 0
1 2011-01-01 00:00:15 A 2 1
2 2011-01-01 00:00:30 B 3 2
3 2011-01-01 00:00:45 B 2 3
4 2011-01-01 00:01:00 B 3 4
5 2011-01-01 00:01:15 B 1 5
6 2011-01-01 00:01:30 C 3 6
7 2011-01-01 00:01:45 C 4 7
8 2011-01-01 00:02:00 C 3 8
我想得到这些:
[out]:
Timestamp name value Temp
1 2011-01-01 00:00:15 A 2 1
5 2011-01-01 00:01:15 B 1 5
8 2011-01-01 00:02:00 C 3 8
答案 0 :(得分:2)
IIUC
insert
或
import java.util.List;
import java.util.Map;
import java.util.function.Function;
import java.util.stream.Collectors;
public class ParentErasure {
public abstract class AbstractDao<T extends DatabaseTable, R extends Record> {
private Connection connection;
private Map<Class, Function> mappers = new HashMap<>();
public <M> void registerMapper(Class<M> mappingClass, Function<M, R> mapper) {
mappers.put(mappingClass, mapper);
}
public <M> List<M> insert(List<M> records) {
if (records.isEmpty()) return records;
M rec = records.get(0);
List<? extends Record> actualRecords = (rec instanceof Record) ?
(List<Record>)records : createMappedRecords(records, rec.getClass());
connection.insertBulk(actualRecords);
return records;
}
private <M> List<R> createMappedRecords(List<M> records, Class<? extends Object> recordsClazz) {
Function<M, R> mapper = mappers.get(recordsClazz);
return records.stream()
.map(mapper::apply)
.collect(Collectors.toList());
}
}
public interface Dao<T> {
public List<T> insert(List<T> records);
}
}
class Record {}
class DatabaseTable {}
class DatabaseRecord {}
class Connection {
public void insertBulk(List<? extends Record> records) {}
}
答案 1 :(得分:1)
如果需要每天和每列name
的最后一个值,请将GroupBy.tail
与Grouper
一起使用:
df1 = df.groupby([pd.Grouper(freq='D', key='Timestamp'), 'name']).tail(1)
print (df1)
Timestamp name value Temp
1 2011-01-01 00:00:15 A 2 1
5 2011-01-01 00:01:15 B 1 5
8 2011-01-01 00:02:00 C 3 8
或将Timestamp
的值转换为Series.dt.date
的日期:
df2 = df.groupby([df['Timestamp'].dt.date, 'name']).tail(1)
print (df2)
Timestamp name value Temp
1 2011-01-01 00:00:15 A 2 1
5 2011-01-01 00:01:15 B 1 5
8 2011-01-01 00:02:00 C 3 8
Series.dt.normalize
还有其他选择:
df2 = df.groupby([df['Timestamp'].dt.normalize(), 'name']).tail(1)
df2 = df.groupby([df['Timestamp'].dt.floor('D'), 'name']).tail(1)