将列分配给pandas df

时间:2018-07-20 04:20:48

标签: python pandas sorting

我正在尝试将<div class="popup">Click me! <span class="popuptext" >Popup text...</span> </div> <div class="popup">Click me2! <span class="popuptext" >Popup text...2</span> </div>分配给现有的Column。具体来说,某些时间戳会被排序,但是当前导出是单独的df。我想将此附加到series

df

输出:

import pandas as pd

d = ({           
    'time' : ['08:00:00 am','12:00:00 pm','16:00:00 pm','20:00:00 pm','2:00:00 am','13:00:00 pm','3:00:00 am'], 
    'code' : ['A','B','C','A','B','C','A'], 
    })

df = pd.DataFrame(data=d)

df['time'] = pd.to_timedelta(df['time'])

cutoff, day = pd.to_timedelta(['3.5H', '24H'])
df.time.apply(lambda x: x if x > cutoff else x + day).sort_values().reset_index(drop=True)
x = df.time.apply(lambda x: x if x > cutoff else x + day).sort_values().reset_index(drop=True).dt.components
x = x.apply(lambda x: '{:02d}:{:02d}:{:02d}'.format(x.days*24+x.hours, x.minutes, x.seconds), axis=1)

我改变了

0    08:00:00
1    12:00:00
2    13:00:00
3    16:00:00
4    20:00:00
5    26:00:00
6    27:00:00

但这会产生

df['time'] = x.apply(lambda x: '{:02d}:{:02d}:{:02d}'.format(x.days*24+x.hours, x.minutes, x.seconds), axis=1)

如您所见。排序后,时间戳未与其各自的值对齐。

预期的输出是:

       time code
0  08:00:00    A
1  12:00:00    B
2  13:00:00    C
3  16:00:00    A
4  20:00:00    B
5  26:00:00    C
6  27:00:00    A

2 个答案:

答案 0 :(得分:0)

我希望这就是你想要的:

import pandas as pd

d = ({           
    'time' : ['08:00:00 am','12:00:00 pm','16:00:00 pm','20:00:00 pm','2:00:00 am','13:00:00 pm','3:00:00 am'], 
    'code' : ['A','B','C','A','B','C','A'], 
    })

df = pd.DataFrame(data=d)

df['time'] = pd.to_timedelta(df['time'])

cutoff, day = pd.to_timedelta(['3.5H', '24H'])
df.time.apply(lambda x: x if x > cutoff else x + day).sort_values().reset_index(drop=True)
print(df)
x = df.time.apply(lambda x: x if x > cutoff else x + day).sort_values().reset_index(drop=True).dt.components
df['time'] = x.apply(lambda x: '{:02d}:{:02d}:{:02d}'.format(x.days*24+x.hours, x.minutes, x.seconds), axis=1)

print(df)

答案 1 :(得分:0)

从您的代码中删除reset_index(drop = True)并在以后进行排序可能对您有用。

import pandas as pd

d = ({           
    'time' : ['08:00:00 am','12:00:00 pm','16:00:00 pm','20:00:00 pm','2:00:00 am','13:00:00 pm','3:00:00 am'], 
    'code' : ['A','B','C','A','B','C','A'], 
    })

df = pd.DataFrame(data=d)

df['time'] = pd.to_timedelta(df['time'])

cutoff, day = pd.to_timedelta(['3.5H', '24H'])

x = df.time.apply(lambda x: x if x > cutoff else x + day).dt.components
df['time'] = x.apply(lambda x: '{:02d}:{:02d}:{:02d}'.format(x.days*24+x.hours, x.minutes, x.seconds), axis=1)
df = df.sort_values('time')

print(df)

熊猫通过索引进行对齐。 reset_index(drop = True)破坏了原始索引,并导致按时间顺序分配了排序时间列。这可能就是为什么您没有得到什么的原因。

原始时间列。

0   08:00:00
1   12:00:00
2   16:00:00
3   20:00:00
4   02:00:00
5   13:00:00
6   03:00:00

在sort_values()之后。

4   02:00:00
6   03:00:00
0   08:00:00
1   12:00:00
5   13:00:00
2   16:00:00
3   20:00:00

reset_index(drop = True)

之后
0   02:00:00
1   03:00:00
2   08:00:00
3   12:00:00
4   13:00:00
5   16:00:00
6   20:00:00
相关问题