我不喜欢熊猫。需要计算每个人,每个位置的时间,并删除不带成对日期的行col。 我的数据如下:
Unit Name Location Date Time
0 K1 Somebody1 LOC1 2020-05-12 07:00
1 K1 Somebody1 LOC1 2020-05-12 20:10
2 K1 Somebody1 LOC1 2020-05-13 06:00
3 K1 Somebody1 LOC1 2020-05-13 20:00
4 K1 Somebody1 LOC1 2020-05-14 06:37
5 K1 Somebody1 LOC2 2020-05-15 07:00
6 K1 Somebody1 LOC2 2020-05-15 20:10
7 K1 Somebody1 LOC2 2020-05-16 06:00
8 K1 Somebody1 LOC2 2020-05-16 20:00
9 K1 Somebody1 LOC2 2020-05-17 06:37
10 K1 Somebody2 LOC2 2020-05-13 07:00
11 K1 Somebody2 LOC2 2020-05-14 10:10
12 K1 Somebody2 LOC2 2020-05-14 16:50
13 K1 Somebody2 LOC2 2020-05-15 05:36
14 K1 Somebody3 LOC1 2020-05-13 07:00
15 K1 Somebody3 LOC1 2020-05-14 10:10
16 K1 Somebody3 LOC1 2020-05-14 16:50
17 K1 Somebody3 LOC1 2020-05-15 05:36
我只想过将时间转换为日期时间对象
df['Time'] = df['Time'].apply(lambda x: datetime.strptime(x,'%H:%M').time())
尝试使用数据透视表,分组依据,进行循环,我没有主意。 我希望输出看起来像这样:
LOC1
Somebody1 2020-05-12 13h 10m
2020-05-13 14h 00m
TOTAL 27h 00m
Somebody2 date hours
date hours
TOTAL sum for somebody2
Somebody3 date hours
date hours
TOTAL sum for somebody3
LOC2
Somebody1 date hours
date hours
TOTAL sum for somebody1
Somebody2 date hours
date hours
TOTAL sum for somebody2
或类似的东西
答案 0 :(得分:1)
IIUC groupby
和combine first
import numpy as np
df['datetime'] = pd.to_datetime(df['Date'] + ' ' + df['Time'])
df1 = df.groupby(['Name','Location', df['datetime'].dt.normalize()])\
.agg(start=('datetime','first'),
end=('datetime','last'))
df1['timespent'] = (df1['end'] - df1['start']) / np.timedelta64(1,'h')
# create total row.
m = df1.unstack(['Name','Location'])['timespent'].sum().unstack()
m = m.assign(TOTAL=m.sum(1)).stack().to_frame('timespent')
final = df1.drop(['start','end'],axis=1).combine_first(m)
#if you want to remove single entry days
final[final['timespent'] > 0]
timespent
Name Location datetime
Somebody1 LOC1 2020-05-12 13.166667
2020-05-13 14.000000
TOTAL NaT 27.166667
Somebody2 LOC2 2020-05-14 6.666667
TOTAL NaT 6.666667
答案 1 :(得分:0)
您可以从grep开始收集每两行的时间,然后计算时间差。例如,将人的名字解析为一个列表,然后使用grep do:
for i in $(cat list-names);do grep $i a.csv | awk '{print$6}';done
其中a.csv:
0 K1 Somebody1 LOC1 2020-05-12 17:00
1 K1 Somebody1 LOC1 2020-05-12 20:10
此外,要抓住小时数的差异,请执行以下操作:
awk '
NR == 1{old = $6; next}
{print $6 - old; old = $6}
' a.csv