Question

为原始数据分配标签，而不是从get_dummies获取新的指标列。我想要这样的东西：

json_input：

[{id：100，vehicle_type：＆＃34; Car＆＃34;，time：＆＃34; 2017-04-06 01：39：43＆＃34;，zone =＆＃34; A＆＃34; ，键入：＆＃34;选中＆＃34;}， {id：101，vehicle_type：＆＃34; Truck＆＃34;，time：＆＃34; 2017-04-06 02：35：45＆＃34;，zone =＆＃34; B＆＃34;，输入：＆＃34;未选中＆＃34;}， {id：102，vehicle_type：＆＃34; Truck＆＃34;，time：＆＃34; 2017-04-05 03：20：12＆＃34;，zone =＆＃34; A＆＃34;，输入：＆＃34;经过＆＃34;}， {id：103，vehicle_type：＆＃34; Car＆＃34;，time：＆＃34; 2017-04-04 10：05：04＆＃34;，zone =＆＃34; C＆＃34;，输入：＆＃34;未选中＆＃34;} ]

结果：

id，vehicle_type，time_range，zone，type
100,0,1,1,1
101,1,1,2,0
102,1,2,1,1
103,0,3,3,0

时间戳 - TS 列 - ＆gt; vehicle_type，type是binary，time_range（1 - >（TS1-TS2），2 - >（TS3-TS4），3->（TS5-TS6）），zone-＆gt;分类（1,2或3）。当我将扁平化的json提供给pandas中的数据帧时，我想自动分配这些标签。这可能吗？（我不希望来自pandas中get_dummies的zone_1，type_1，vehicle_type_3指标列）。如果pandas不可能，请为此自动化建议python lib。

Answer 1

这是我能想到的。我不知道你在寻找什么时间范围

import datetime
import io
import pandas as pd
import numpy as np
df_string='[{"id":100,"vehicle_type":"Car","time":"2017-04-06 01:39:43","zone":"A","type":"Checked"},{"id":101,"vehicle_type":"Truck","time":"2017-04-06 02:35:45","zone":"B","type":"Unchecked"},{"id":102,"vehicle_type":"Truck","time":"2017-04-05 03:20:12","zone":"A","type":"Checked"},{"id":103,"vehicle_type":"Car","time":"2017-04-04 10:05:04","zone":"C","type":"Unchecked"}]'
df = pd.read_json(io.StringIO(df_string))
df['zone'] = pd.Categorical(df.zone)
df['vehicle_type'] = pd.Categorical(df.vehicle_type)
df['type'] = pd.Categorical(df.type)
df['zone_int'] = df.zone.cat.codes
df['vehicle_type_int'] = df.vehicle_type.cat.codes
df['type_int'] = df.type.cat.codes
df.head()

编辑这是我能想到的

import datetime
import io
import math
import pandas as pd
#Taken from http://stackoverflow.com/questions/13071384/python-ceil-a-datetime-to-next-quarter-of-an-hour
def ceil_dt(dt, num_seconds=900):
    nsecs = dt.minute*60 + dt.second + dt.microsecond*1e-6  
    delta = math.ceil(nsecs / num_seconds) * num_seconds - nsecs
    return dt + datetime.timedelta(seconds=delta)

df_string='[{"id":100,"vehicle_type":"Car","time":"2017-04-06 01:39:43","zone":"A","type":"Checked"},{"id":101,"vehicle_type":"Truck","time":"2017-04-06 02:35:45","zone":"B","type":"Unchecked"},{"id":102,"vehicle_type":"Truck","time":"2017-04-05 03:20:12","zone":"A","type":"Checked"},{"id":103,"vehicle_type":"Car","time":"2017-04-04 10:05:04","zone":"C","type":"Unchecked"}]'
df = pd.read_json(io.StringIO(df_string))
df['zone'] = pd.Categorical(df.zone)
df['vehicle_type'] = pd.Categorical(df.vehicle_type)
df['type'] = pd.Categorical(df.type)
df['zone_int'] = df.zone.cat.codes
df['vehicle_type_int'] = df.vehicle_type.cat.codes
df['type_int'] = df.type.cat.codes
df['time'] = pd.to_datetime(df.time)
df['dayofweek'] = df.time.dt.dayofweek
df['month_int'] = df.time.dt.month
df['year_int'] = df.time.dt.year
df['day'] = df.time.dt.day
df['date'] = df.time.apply(lambda x: x.date())
df['month'] = df.date.apply(lambda x: datetime.date(x.year, x.month, 1))
df['year'] = df.date.apply(lambda x: datetime.date(x.year, 1, 1))
df['hour'] = df.time.dt.hour
df['mins']  = df.time.dt.minute
df['seconds'] = df.time.dt.second
df['time_interval_3hour'] = df.hour.apply(lambda x : math.floor(x/3)+1)
df['time_interval_6hour'] = df.hour.apply(lambda x : math.floor(x/6)+1)
df['time_interval_12hour'] = df.hour.apply(lambda x : math.floor(x/12)+1)
df['weekend']  = df.dayofweek.apply(lambda x:  x>4)

df['ceil_quarter_an_hour'] =df.time.apply(lambda x : ceil_dt(x))
df['ceil_half_an_hour'] =df.time.apply(lambda x : ceil_dt(x, num_seconds=1800))
df.head()

将自定义类别分配给json数据 - pandas

1 个答案: