我正在尝试开发效率更高的loop
来解决问题。目前,如果下面的代码与特定值对齐,则会应用string
。但是,这些值的顺序相同,因此loop
可以使此过程更有效。
以下面的df
为例,使用integers
代表时间段,每个整数增加等于15分钟。 1 == 8:00:00
和2 == 8:15:00
等。此刻,我将重复此过程,直到最后一个时间段。如果达到80
,效率可能会非常低。可以在这里合并loop
吗?
import pandas as pd
d = ({
'Time' : [1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,6,6,6],
})
df = pd.DataFrame(data = d)
def time_period(row) :
if row['Time'] == 1 :
return '8:00:00'
if row['Time'] == 2 :
return '8:15:00'
if row['Time'] == 3 :
return '8:30:00'
if row['Time'] == 4 :
return '8:45:00'
if row['Time'] == 5 :
return '9:00:00'
if row['Time'] == 6 :
return '9:15:00'
.....
if row['Time'] == 80 :
return '4:00:00'
df['24Hr Time'] = df.apply(lambda row: time_period(row), axis=1)
print(df)
出局:
Time 24Hr Time
0 1 8:00:00
1 1 8:00:00
2 1 8:00:00
3 2 8:15:00
4 2 8:15:00
5 2 8:15:00
6 3 8:30:00
7 3 8:30:00
8 3 8:30:00
9 4 8:45:00
10 4 8:45:00
11 4 8:45:00
12 5 9:00:00
13 5 9:00:00
14 5 9:00:00
15 6 9:15:00
16 6 9:15:00
17 6 9:15:00
答案 0 :(得分:3)
这可以通过一些简单的timdelta算法实现:
df['24Hr Time'] = (
pd.to_timedelta((df['Time'] - 1) * 15, unit='m') + pd.Timedelta(hours=8))
df.head()
Time 24Hr Time
0 1 08:00:00
1 1 08:00:00
2 1 08:00:00
3 2 08:15:00
4 2 08:15:00
df.dtypes
Time int64
24Hr Time timedelta64[ns]
dtype: object
如果您需要一个字符串,请将pd.to_datetime
与单位和来源一起使用:
df['24Hr Time'] = (
pd.to_datetime((df['Time']-1) * 15, unit='m', origin='8:00:00')
.dt.strftime('%H:%M:%S'))
df.head()
Time 24Hr Time
0 1 08:00:00
1 1 08:00:00
2 1 08:00:00
3 2 08:15:00
4 2 08:15:00
df.dtypes
Time int64
24Hr Time object
dtype: object
答案 1 :(得分:3)
我最终使用了
# change factor levels to order they occur
# you could also custom-specify an order, with, e.g., `levels = c("Li", "Ce", "Pr", ...)`
dataMGSREE$Element = factor(dataMGSREE$Element, levels = unique(dataMGSREE$Element))
# plot with changes explained above
ggplot(data = dataMGSREE,
mapping = aes(x = Element, y = Concentration, color = Analysis, group = Analysis)) +
geom_point(show.legend = FALSE) +
geom_line() +
scale_y_log10()
答案 2 :(得分:2)
通常,您要制作字典并申请
my_dict = {'old_val1': 'new_val1',...}
df['24Hr Time'] = df['Time'].map(my_dict)
但是,在这种情况下,您可以使用时间增量:
df['24Hr Time'] = pd.to_timedelta(df['Time']*15, unit='T') + pd.to_timedelta('7:45:00')
输出(请注意,新列的类型为timedelta
,而不是字符串)
Time 24Hr Time
0 1 08:00:00
1 1 08:00:00
2 1 08:00:00
3 2 08:15:00
4 2 08:15:00
5 2 08:15:00
6 3 08:30:00
7 3 08:30:00
8 3 08:30:00
9 4 08:45:00
10 4 08:45:00
11 4 08:45:00
12 5 09:00:00
13 5 09:00:00
14 5 09:00:00
15 6 09:15:00
16 6 09:15:00
17 6 09:15:00
答案 3 :(得分:1)
一种有趣的方式是使用pd.timedelta_range
和index.repeat
n = df.Time.nunique()
c = df.groupby('Time').size()
df['24_hr'] = pd.timedelta_range(start='8 hours', periods=n, freq='15T').repeat(c)
Out[380]:
Time 24_hr
0 1 08:00:00
1 1 08:00:00
2 1 08:00:00
3 2 08:15:00
4 2 08:15:00
5 2 08:15:00
6 3 08:30:00
7 3 08:30:00
8 3 08:30:00
9 4 08:45:00
10 4 08:45:00
11 4 08:45:00
12 5 09:00:00
13 5 09:00:00
14 5 09:00:00
15 6 09:15:00
16 6 09:15:00
17 6 09:15:00