我想知道熊猫中是否有更好的方法可以达到相同的目的:
x = [1, 1, 1, 2, 2, 2, 3, 3, 3, 5, 5, 1, 1, 2, 2]
x = np.asarray(x)
df = pd.DataFrame(columns=['id', 'start', 'end'])
if len(x) > 1:
i = 0
for j in range(1, len(x)):
if x[j] == x[j-1]:
continue
else:
df.loc[len(df)] = [x[i], i, j-1]
i = j;
df.loc[len(df)] = [x[i], i, j]
else:
df.loc[len(df)] = [x[0], 0, 0]
输出看起来像这样
[1 1 1 2 2 2 3 3 3 5 5 1 1 2 2]
id start end
0 1 0 2
1 2 3 5
2 3 6 8
3 5 9 10
4 1 11 12
5 2 13 14
感谢有用的提示。
答案 0 :(得分:3)
这是您可以使用numpy
进行操作的一种方式:
x = np.array([1, 1, 1, 2, 2, 2, 3, 3, 3, 5, 5, 1, 1, 2, 2])
# Search for all consecutive non equal values in the array
vals = x[x != np.roll(x, 1)]
# array([1, 2, 3, 5, 1, 2])
# Indices where changes in x occur
d = np.flatnonzero(np.diff(x) != 0)
# array([ 2, 5, 8, 10, 12])
start = np.hstack([0] + [d+1])
# array([ 0, 3, 6, 9, 11, 13])
end = np.hstack([d, len(x)-1])
# array([ 2, 5, 8, 10, 12, 14])
pd.DataFrame({'id':vals, 'start':start, 'end':end})
id start end
0 1 0 2
1 2 3 5
2 3 6 8
3 5 9 10
4 1 11 12
5 2 13 14
答案 1 :(得分:3)
您可以仅使用熊猫来执行以下操作:
import numpy as np
import pandas as pd
x = [1, 1, 1, 2, 2, 2, 3, 3, 3, 5, 5, 1, 1, 2, 2]
s = pd.Series(x)
# store group-by to avoid repetition
groups = s.groupby((s != s.shift()).cumsum())
# get id and size for each group
ids, size = groups.first(), groups.size()
# get start
start = size.cumsum().shift().fillna(0).astype(np.int32)
# get end
end = (start + size - 1)
df = pd.DataFrame({'id': ids, 'start': start, 'end': end}, columns=['id', 'start', 'end'])
print(df)
输出
id start end
1 1 0 2
2 2 3 5
3 3 6 8
4 5 9 10
5 1 11 12
6 2 13 14
答案 2 :(得分:3)
另一种解决方案:
df= pd.DataFrame(data=[1, 1, 1, 2, 2, 2, 3, 3, 3, 5, 5, 1, 1, 2, 2],columns=['id'])
g=df.groupby((df.id!=df.id.shift()).cumsum())['id']
df_new=pd.concat([g.first(),g.apply(lambda x: x.duplicated(keep='last').idxmax()),\
g.apply(lambda x: x.duplicated(keep='last').idxmin())],axis=1)
df_new.columns=['id','start','end']
print(df_new)
id start end
id
1 1 0 2
2 2 3 5
3 3 6 8
4 5 9 10
5 1 11 12
6 2 13 14
答案 3 :(得分:0)
使用itertools.groupby
import pandas as pd
from itertools import groupby
x = [1, 1, 1, 2, 2, 2, 3, 3, 3, 5, 5, 1, 1, 2, 2]
l = []
for i in [list(g) for _,g in groupby(enumerate(x), lambda x:x[1])]:
l.append( (i[0][1], i[0][0], i[-1][0]) )
print (pd.DataFrame(l, columns=['id','start','end']))
输出:
id start end
0 1 0 2
1 2 3 5
2 3 6 8
3 5 9 10
4 1 11 12
5 2 13 14