如何将使用情况数据重整为分钟格式?

时间:2019-04-26 14:14:18

标签: python pandas

如何将以下原始使用情况数据重塑为“分钟数据框”。这样的操作是否有特殊的熊猫功能,可以将原始数据划分为分钟?

原始使用情况数据示例:

**Video-ID |        UsageStart** |  **Duration in sec** |

0 | 260581 |    2019-04-25 00:00:00 |   10 |

1 | 316288 |    2019-04-25 00:01:05 |   20 |

2 | 791714 |    2019-04-25 00:01:30 |   10 |

3 | 790503 |    2019-04-25 00:02:30 |   90 |

4 | 646034 |    2019-04-25 00:03:10 |   100 |

所需的输出:分钟格式:

**Minute |  StartTime | UsageAmount in sec |**

1 | 2019-04-25 00:00:00 |   10 |

2 | 2019-04-25 00:01:00 |   30 |

3 | 2019-04-25 00:02:00 |   30 |

4 | 2019-04-25 00:03:00 |   110 |

5 | 2019-04-25 00:04:00 |   50 |

说实话,我不知道该怎么做。也许这必须先秒秒完成,而不是重新调整为分钟格式。

感谢您的帮助。

3 个答案:

答案 0 :(得分:1)

# convert UsageStart to datetime column
df['UsageStart']= pd.to_datetime(df['UsageStart'])
# reindex and sum 
df = df.set_index('UsageStart').resample('1T').sum()

答案 1 :(得分:0)

您好,将日期时间列设置为索引后,您可以使用pandas.DataFrame.resample方法, 像

df["UsageStart"] = pd.to_datetime["UsageStart"]
df = df.set_index("UsageStart")

df = df.resample(freq="1Min").mean()

但我不知道平均数能否为您提供所需的输出结果

答案 2 :(得分:0)

这不是一个纯粹的JAVA_HOME environment variable is set to /usr/java/jdk1.8.0_211-amd64 CARBON_HOME environment variable is set to /opt/wso2am-analytics RUNTIME_HOME environment variable is set to /opt/wso2am-analytics/wso2/dashboard [2019-04-26 16:16:20,189] INFORMAÇÕES {org.wso2.carbon.launcher.extensions.OSGiLibBundleDeployerUtils updateOSGiLib} - Successfully updated the OSGi bundle information of Carbon Runtime: dashboard osgi> [2019-04-26 16:16:22,006] INFO {org.wso2.msf4j.internal.websocket.EndpointsRegistryImpl} - Endpoint Registered : /websocket-provider/{topic} [2019-04-26 16:16:22,559] INFO {org.wso2.msf4j.internal.websocket.WebSocketServerSC} - All required capabilities are available of WebSocket service component is available. [2019-04-26 16:16:22,571] INFO {org.wso2.carbon.metrics.core.config.model.JmxReporterConfig} - Creating JMX reporter for Metrics with domain 'org.wso2.carbon.metrics' [2019-04-26 16:16:22,599] INFO {org.wso2.msf4j.internal.MicroservicesServerSC} - All microservices are available [2019-04-26 16:16:22,599] INFO {org.wso2.carbon.metrics.core.reporter.impl.AbstractReporter} - Started JMX reporter for Metrics [2019-04-26 16:16:22,669] INFO {org.wso2.msf4j.analytics.metrics.MetricsComponent} - Metrics Component is activated [2019-04-26 16:16:22,673] INFO {org.wso2.carbon.databridge.agent.internal.DataAgentDS} - Successfully deployed Agent Server [2019-04-26 16:16:22,742] INFO {org.wso2.transport.http.netty.listener.ServerConnectorBootstrap$HTTPServerConnector} - HTTP(S) Interface starting on host 0.0.0.0 and port 9643 [2019-04-26 16:17:22,535] WARN {org.wso2.carbon.kernel.internal.startupresolver.StartupOrderResolver} - Startup component carbon-deployment-service from bundle(org.wso2.carbon.deployment.engine:5.2.0) is in the pending state until Capability org.wso2.carbon.deployment.engine.Deployer from bundle(org.wso2.carbon.uiserver:0.19.5) is available as an OSGi service. Refer the Startup Order Resolver documentation for information. [2019-04-26 16:17:22,536] WARN {org.wso2.carbon.kernel.internal.startupresolver.StartupOrderResolver} - Startup component carbon-ui-server-startup-listener from bundle(org.wso2.carbon.uiserver:0.19.5) is in the pending state, because of the Capability org.wso2.carbon.uiserver.spi.RestApiProvider from bundle(org.wso2.carbon.business.rules.core:2.0.423). If you've registered this capability as an OSGi service, you need to declare it using the Carbon-Component manifest header. Refer the Startup Order Resolver documentation for information. [2019-04-26 16:17:22,536] WARN {org.wso2.carbon.kernel.internal.startupresolver.StartupOrderResolver} - Startup component carbon-ui-server-startup-listener from bundle(org.wso2.carbon.uiserver:0.19.5) is in the pending state, because of the Capability org.wso2.carbon.uiserver.spi.RestApiProvider from bundle(org.wso2.carbon.dashboards.api:4.0.38). If you've registered this capability as an OSGi service, you need to declare it using the Carbon-Component manifest header. Refer the Startup Order Resolver documentation for information. [2019-04-26 16:17:22,536] WARN {org.wso2.carbon.kernel.internal.startupresolver.StartupOrderResolver} - Startup component carbon-ui-server-startup-listener from bundle(org.wso2.carbon.uiserver:0.19.5) is in the pending state until Capability org.wso2.carbon.uiserver.spi.RestApiProvider from bundle(org.wso2.carbon.data.provider:2.0.423) is available as an OSGi service. Refer the Startup Order Resolver documentation for information. [2019-04-26 16:17:22,536] WARN {org.wso2.carbon.kernel.internal.startupresolver.StartupOrderResolver} - Startup component carbon-ui-server-startup-listener from bundle(org.wso2.carbon.uiserver:0.19.5) is in the pending state until Capability org.wso2.carbon.uiserver.spi.RestApiProvider from bundle(org.wso2.carbon.status.dashboard.core:2.0.423) is available as an OSGi service. Refer the Startup Order Resolver documentation for information. [2019-04-26 16:17:22,537] WARN {org.wso2.carbon.kernel.internal.startupresolver.StartupOrderResolver} - Startup component sp-idp-service from bundle(org.wso2.carbon.analytics.idp.client:6.0.70) is in the pending state, because of the Capability org.wso2.carbon.analytics.idp.client.core.spi.IdPClientFactory from bundle(org.wso2.carbon.analytics.idp.client:6.0.70). If you've registered this capability as an OSGi service, you need to declare it using the Carbon-Component manifest header. Refer the Startup Order Resolver documentation for information. [2019-04-26 16:17:22,537] WARN {org.wso2.carbon.kernel.internal.startupresolver.StartupOrderResolver} - Startup component sp-idp-service from bundle(org.wso2.carbon.analytics.idp.client:6.0.70) is in the pending state until Capability org.wso2.carbon.analytics.idp.client.core.spi.IdPClientFactory from bundle(org.wso2.carbon.analytics.idp.client:6.0.70) is available as an OSGi service. Refer the Startup Order Resolver documentation for information. 解决方案,我很确定有很多棘手的oneliner方法可以做到这一点,但是我仍然是熊猫的基本用户。

我使用一种递归函数,该函数通过将给定的pandas的秒数添加到连续的分钟中来消耗该duration,并从d开始将其存储为字典startime的键:

def cumsec(startime, duration, d):
    if duration == 0:
        return d
    to_minute = (60 - startime.second)%60 if (60 - startime.second)%60 else 60
    to_add = to_minute if duration - to_minute >= 0 else duration
    d[startime.replace(second=0)] += to_add
    startime = (startime + dt.timedelta(minutes=1)).replace(second=0)
    return cumsec(startime, duration - to_add, d)



然后只需将此功能应用于每一行:

from collections import defaultdict
import datetime as dt
import pandas as pd

# small df arrangements
df.columns = ["VideoId", "UsageStart", "Duration"]
df["UsageStart"] = pd.to_datetime(df["UsageStart"])


d = defaultdict(int)
for r in df.itertuples():
    cumsec(r.UsageStart, r.Duration, d)


为了添加可能的空分钟,您可以执行以下操作,但是我敢肯定pandas中有一个特定的方法可以做到这一点(如果不需要这种行为,可以跳过此部分):< / p>

first = min(d.keys())
last = max(d.keys())

d = {
    first + dt.timedelta(minutes=i): d.get(first + dt.timedelta(minutes=i), 0) 
    for i in range(int((last - first).total_seconds()//60) + 1)
}

最后创建一个新的DataFrame:

cumdf = pd.DataFrame({"StartTime": list(d.keys()), "UsageAmount": list(d.values())})
cumdf = cumdf.sort_values("StartTime").reset_index(drop=True)
cumdf["Minute"] = range(1, len(d) + 1)
print(cumdf)

因此,如果您输入的是:

Video-ID |        UsageStart |  Duration
459224 |    2019-04-24 23:59:59 |   2
260581 |    2019-04-25 00:00:00 |   10
316288 |    2019-04-25 00:01:05 |   20
791714 |    2019-04-25 00:01:30 |   10
790503 |    2019-04-25 00:02:30 |   90
646034 |    2019-04-25 00:03:10 |   100
934784 |    2019-04-25 00:09:10 |   40

输出为:


             StartTime  UsageAmount  Minute
0  2019-04-24 23:59:00            1       1
1  2019-04-25 00:00:00           11       2
2  2019-04-25 00:01:00           30       3
3  2019-04-25 00:02:00           30       4
4  2019-04-25 00:03:00          110       5
5  2019-04-25 00:04:00           50       6
6  2019-04-25 00:05:00            0       7
7  2019-04-25 00:06:00            0       8
8  2019-04-25 00:07:00            0       9
9  2019-04-25 00:08:00            0      10
10 2019-04-25 00:09:00           40      11