我有一个数据帧,其列为time_bin
,是对hours
的分箱:
df= unique_id time_bin
s_001 2-3
s_002 5-8
s_003 3-6
s_004 2-7
s_005 5-9
我只想创建一个数据列,其列的范围从0到24,如0-1,1-2,2-3 ...... 23-24,并将列的标志升为'1',即time_bin
列的范围内,其他列将为'0'。例如:
new_df= unique_id time_bin 0-1 1-2 2-3 3-4 4-5 5-6 6-7 7-8 8-9 9-10.............. 23-24
s_001 2-5 0 1 1 1 1 0 0 0 0 0 ................. 0
s_002 6-8 0 0 0 0 0 0 1 0 0 0 ................. 0
s_003 8-10 0 0 0 0 0 0 0 0 1 1 ................. 0
s_004 2-7 0 0 1 1 1 1 1 0 0 0 ................. 0
..... ......
..... ......
答案 0 :(得分:0)
尝试一下:
import pandas as pd
df=pd.DataFrame({"unique_id": ["s_001", "s_002", "s_003", "s_004", "s_005"], "time_bin": ["2-3", "5-8", "3-6", "2-7", "5-9"]})
for el in range(24):
df[str(el)+"-"+str(el+1)]=0
df2=df["time_bin"].apply(lambda x: pd.Series({str(el)+"-"+str(el+1): 1 for el in range(int(x.split("-")[0]), int(x.split("-")[1]))})).fillna(0).astype("int")
df[df2.columns]=df2
print(df)
输出:
unique_id time_bin 0-1 1-2 ... 20-21 21-22 22-23 23-24
0 s_001 2-3 0 0 ... 0 0 0 0
1 s_002 5-8 0 0 ... 0 0 0 0
2 s_003 3-6 0 0 ... 0 0 0 0
3 s_004 2-7 0 0 ... 0 0 0 0
4 s_005 5-9 0 0 ... 0 0 0 0
[5 rows x 26 columns]
[Program finished]
答案 1 :(得分:0)
这很好:
df = pd.DataFrame({
'unique_id': ['s_001', 's_002', 's_003', 's_004', 's_005'],
'time_bin': ['2-3', '5-8', '3-6', '2-7', '5-9']
})
def hour_in_interval(interval, hour):
first = int(interval[0])
last = int(interval[2])
if first <= hour < last:
return 1
else:
return 0
hours = pd.DataFrame(
{'{}-{}'.format(i, i+1): df.time_bin.apply(hour_in_interval, hour=i) for i in range(24)}
)
df = pd.concat([df, hours], axis=1)
答案 2 :(得分:0)
您可以使用pd.arrays.IntervalArray
和listcomp完成此操作
s = df.time_bin.str.split('-')
ia_bins = pd.arrays.IntervalArray.from_arrays(s.str[0].astype(int),
s.str[1].astype(int), closed='both')
ia_cols = pd.arrays.IntervalArray.from_breaks(range(0,25), closed='both')
ia_arr = [ia_cols.overlaps(x).astype(int) for x in ia_bins]
new_df = df.join(pd.DataFrame(ia_arr, columns=ia_cols).rename(lambda x: f'{x.left}-{x.right}', axis=1))
unique_id time_bin 0-1 1-2 2-3 3-4 4-5 5-6 6-7 7-8 8-9 9-10 \
0 s_001 2-3 0 1 1 1 0 0 0 0 0 0
1 s_002 5-8 0 0 0 0 1 1 1 1 1 0
2 s_003 3-6 0 0 1 1 1 1 1 0 0 0
3 s_004 2-7 0 1 1 1 1 1 1 1 0 0
4 s_005 5-9 0 0 0 0 1 1 1 1 1 1
10-11 11-12 12-13 13-14 14-15 15-16 16-17 17-18 18-19 19-20 \
0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0 0
20-21 21-22 22-23 23-24
0 0 0 0 0
1 0 0 0 0
2 0 0 0 0
3 0 0 0 0
4 0 0 0 0
注意:如果您更喜欢pd.IntervalIndex
pd.arrays.IntervalArray