我有一个包含每半小时能耗的数据集。我正在尝试获取长时间没有能源消耗的行的索引列表。换句话说,我正在尝试获取在特定列中包含连续值0的索引列表。我使用下面的代码,它似乎可以工作一段时间,但是随后它开始添加不为0的索引列表。
import more_itertools as mit
indices = df.loc[df[df.columns[2]] == df[df.columns[2]].isnull()].index.values.tolist()
outages_indices = [list(group) for group in mit.consecutive_groups(indices)]
long_outages_indices = []
for i in outages_indices:
if len(i) >= 8:
long_outages_indices.append(i)
例如,在849246行中,该值的确为0,但在1543677行中,该值为0.105,但仍属于列表的一部分。
DataFrame的前几行:
LCLid tstp energy(kWh/hh)
MAC000002 2012-10-12 00:30:00.0000000 0.0
MAC000002 2012-10-12 01:00:00.0000000 0.0
MAC000002 2012-10-12 01:30:00.0000000 0.0
MAC000002 2012-10-12 02:00:00.0000000 0.0
MAC000002 2012-10-12 02:30:00.0000000 0.0
所需的输出(我已经知道了,但这是不正确的):
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ...],
[861958, 861959, 861960, 861961 ...],
[862015, 862016, 862017, 862018, ...], ...]
编辑:已解决。当我将多个CSV文件连接到一个Pandas DataFrame中时,当连接一个新文件时,索引编号将再次从0开始。我重置了索引编号,这解决了我的问题。
答案 0 :(得分:0)
您想与groupby
一起cumsum
:
df = pd.DataFrame({'energy':[1,0,0,0,1,1,0,0,0]})
# mark the non-zero
s = df.energy.ne(0)
# groupby
new_df = df.groupby([s, s.cumsum()]).apply(lambda x: list(x.index))
给您
energy energy
False 1 [1, 2, 3]
3 [6, 7, 8]
True 1 [0]
2 [4]
3 [5]
dtype: object
和那些感兴趣的索引是那些具有False
0级索引的索引。那是
new_df.loc[False]
给您
energy
1 [1, 2, 3]
3 [6, 7, 8]
dtype: object
答案 1 :(得分:0)
您的解决方案已经接近,但是我认为用于提取零能量索引的条件存在错误。你有:
. . .
indices = df.loc[df[df.columns[2]] == df[df.columns[2]].isnull()].index.values.tolist()
. . .
这是一种寻找零能量行索引的奇怪方法。
以下对我有用:
import pandas as pd
import more_itertools as mit
df = pd.DataFrame({'energy': [0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1]})
# find the indices with zero energy
indices = df.loc[df['energy'] == 0].index.values.tolist()
# extract long outages
threshold = 4 # minimum length for an outage to be considered "long"
outages_indices = [list(group) for group in mit.consecutive_groups(indices)]
long_outages_indices = [l for l in outages_indices if len(l) >= threshold]
如果您还想包含None
的能量值,则可以执行以下操作:
import pandas as pd
import more_itertools as mit
df = pd.DataFrame({'energy': [0, None, 0, 0, 1, 0, 0, 1, 0, None, 0, None, 1]})
df = df.fillna(value=0)
# find the indices with zero energy
indices = df.loc[df['energy'] == 0].index.values.tolist()
# extract long outages
threshold = 4 # minimum length for an outage to be considered "long"
outages_indices = [list(group) for group in mit.consecutive_groups(indices)]
long_outages_indices = [l for l in outages_indices if len(l) >= threshold]