计算字符串列的总和

时间:2015-02-03 08:53:50

标签: python pandas

如何计算pandas中字符串列的总数?

myl=[('2012-11-07 19:16:07', ' 2012-11-07 19:21:07', ' 0h 05m 00s'),
 ('2012-11-13 06:16:07', ' 2012-11-13 06:21:07', ' 0h 05m 00s'),
 ('2012-11-15 09:56:07', ' 2012-11-15 11:41:07', ' 1h 45m 00s'),
 ('2012-11-15 22:26:07', ' 2012-11-16 07:01:07', ' 8h 35m 00s')]

import pandas as pd
df = pd.DataFrame(myl, columns=['from', 'to', 'downtime'])

上述代码将在单个列中返回“停机时间”。如何获取该列中的整数值总数?

In [5]: df
Out[5]:
                  from                    to     downtime
0  2012-11-07 19:16:07   2012-11-07 19:21:07   0h 05m 00s
1  2012-11-13 06:16:07   2012-11-13 06:21:07   0h 05m 00s
2  2012-11-15 09:56:07   2012-11-15 11:41:07   1h 45m 00s
3  2012-11-15 22:26:07   2012-11-16 07:01:07   8h 35m 00s

例如在上述输出中,预计停机时间列总数为9h 90m 00s


更新

我如何计算每天的停机时间?

预期结果:

2012-11-07 0h 05m 00s
2012-11-13 0h 05m 00s
2012-11-15 10h 20m 00s

这是按预期工作的:

df['downtime_t'] = pd.to_timedelta(df['downtime'])

df['year'] = pd.DatetimeIndex(pd.to_datetime(df['from'])).year
df['month'] = pd.DatetimeIndex(pd.to_datetime(df['from'])).month
df['day'] = pd.DatetimeIndex(pd.to_datetime(df['from'])).day

df.groupby(['year', 'month', 'day'])['downtime_t'].sum()

这也适用于年度分组:

df['from_d2'] = pd.to_datetime(df['from'])
df.groupby(df['from_d2'].map(lambda x:  x.year ))['downtime_t'].sum()

但这不起作用:

df.groupby(df['from_d2'].map(lambda x:  x.year, x.month, x.day))['downtime_t'].sum()

还有其他方法可以实现分组总数吗?

1 个答案:

答案 0 :(得分:2)

您可以使用pandas'to_timedelta功能。

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.to_timedelta.html

pd.to_timedelta(df['downtime']).sum()