我有使用sqlalchemy从sqlite3中拉出的表。该表包含每次展示汽车的日期和时间:
Id Car Code ShowTime
1 Honda A 10/18/2017 14:45
1 Honda A 10/18/2017 17:10
3 Honda C 10/18/2017 19:35
4 Toyota B 10/18/2017 12:20
4 Toyota B 10/18/2017 14:45
所需的输出是分隔日期并将每个时间戳记放在列表对象上:
"data":{
'id': '1',
'schedule': {
'car': 'Honda',
'show_date': '10/18/2017',
'time_available': [
'14:45',
'17:10',
],
'code': 'A'
}
},{
'id': '3',
'schedule': {
'car': 'Honda',
'show_date': '10/18/2017',
'time_available': [
'19:35'
],
'code': 'C'
}
},{
'id': '4',
'schedule': {
'car': 'Toyota',
'show_date': '10/18/2017',
'time_available': [
'12:20',
'14:45'
],
'code': 'B'
}
}
任何帮助将不胜感激!
答案 0 :(得分:1)
您可以使用熊猫来拆分ShowTime
列:
In [22]: import pandas as pd
In [68]: df = pd.read_csv('test.csv')
In [69]: df.rename(columns={'Id':'id','Car':'car', 'Code':'code'}, inplace=True)
In [70]: df[['show_date', 'time_available']] = df.ShowTime.str.split(' ', expand=True)
In [71]: df.drop('ShowTime', axis=1, inplace=True)
In [72]: df
Out[72]:
id car code show_date time_available
0 1 Honda A 10/18/2017 14:45
1 1 Honda A 10/18/2017 17:10
2 3 Honda C 10/18/2017 19:35
3 4 Toyota B 10/18/2017 12:20
4 4 Toyota B 10/18/2017 14:45
groupby
列具有分类值,并将“ time_available”列转换为分组数据框上的列表:
In [134]: df_grp = df.groupby(['id', 'car','code', 'show_date'])
In [136]: df_grp_time_stacked = df_grp['time_available'].apply(list).reset_index()
In [138]: df_grp_time_stacked
Out[138]:
id car code show_date time_available
0 1 Honda A 10/18/2017 [14:45, 17:10]
1 3 Honda C 10/18/2017 [19:35]
2 4 Toyota B 10/18/2017 [12:20, 14:45]
In [139]: df_grp_time_stacked['time_available'] = df_grp_time_stacked['time_available'].apply(lambda x:x[0] if (len(x)=
...: =1) else x)
In [140]: df_grp_time_stacked
Out[140]:
id car code show_date time_available
0 1 Honda A 10/18/2017 [14:45, 17:10]
1 3 Honda C 10/18/2017 19:35
2 4 Toyota B 10/18/2017 [12:20, 14:45]
现在将数据框转换为字典:
In [165]: raw_dict = df_grp_time_stacked.to_dict(orient='records')
In [166]: data = {'data':raw_dict}
In [167]: data
Out[167]:
{'data': [{'id': 1,
'car': 'Honda',
'code': 'A',
'show_date': '10/18/2017',
'time_available': ['14:45', '17:10']},
{'id': 3,
'car': 'Honda',
'code': 'C',
'show_date': '10/18/2017',
'time_available': '19:35'},
{'id': 4,
'car': 'Toyota',
'code': 'B',
'show_date': '10/18/2017',
'time_available': ['12:20', '14:45']}]}
答案 1 :(得分:0)
您也可以尝试json库。它有点hacky,因为您必须进行一些替换。由于第一版中的错误而对其进行了更改。
import json
data = """your string"""
data = data.replace("\n", "").replace("\t", "")
data = data.replace(r"'",r'\"').replace(" ", "").replace(",]", "]").replace('"data":', "").replace("},", r"}},")
outlist = list()
for helper in data.split(r"},"):
helper = '"'+helper+'"'
with open(path, 'w') as f:
f.write(helper)
with open(path, 'r') as f:
json_file = json.load(f)
out_dict = json.loads(json_file)
outlist.append(out_dict)
print(outlist)
这将产生一则字典列表: [{'id':'1','schedule':{'car':'Honda','show_date':'10 / 18/2017','time_available':['14:45','17:10 '],'code':'A'}},{'id':'3','schedule':{'car':'Honda','show_date':'10 / 18/2017','time_available' :['19:35'],'code':'C'}}},{'id':'4','schedule':{'car':'Toyota','show_date':'10 / 18 / 2017”,“ time_available”:['12:20”,“ 14:45”],“代码”:“ B”}}]
答案 2 :(得分:0)
您在这里:
ddf = df.groupby('season').apply(lambda x : x['Date'] - x.loc[x['Holiday_Name'] == 'Easter']['Date'].iloc[0]).reset_index()
df['difference'] = ddf['Date']
season Date Holiday_Name difference
0 12-13 2012-11-01 NaN -150 days
1 12-13 2012-11-02 Nan -149 days
2 12-13 2013-03-31 Easter 0 days
3 12-13 2013-04-05 NaN 5 days
4 13-14 2013-11-01 NaN -170 days
5 13-14 2014-04-18 Nan -2 days
6 13-14 2014-04-20 Easter 0 days
7 13-14 2014-04-22 Nan 2 days
输出:
import pandas as pd
from collections import defaultdict
data = {'Id': [1,1,3,4,4], 'Car': ['Honda','Honda','Honda','Toyota','Toyota'], 'Code': ['A','A','C','B','B'],
'ShowTime': ['10/18/2017 14:45', '10/18/2017 17:10', '10/18/2017 19:35', '10/18/2017 12:20', '10/18/2017 14:45']}
df = pd.DataFrame(data)
# split time data into 2 columns
df['Date'], df['Time'] = df['ShowTime'].str.split(' ', 1).str
# drop unneeded column
df = df.drop(['ShowTime'],axis=1)
def create_dictionary(i):
# select data
selected_data = df.loc[df['Id'] == i]
# get data
id = selected_data['Id'].unique()
car = selected_data['Car'].unique()
code = selected_data['Code'].unique()
date = selected_data['Date'].unique()
time = selected_data['Time'].unique()
# create dictionary
dictionary_data = {'id': id[0], 'schedule': {'car': car[0], 'show_date': date[0],
'time_available': list(time), 'code': code[0]}}
return dictionary_data
# get id list
id_list = list(df['Id'].unique())
# create data dictionary
out_data = defaultdict(list)
for i in id_list:
one = create_dictionary(i)
out_data["data"].append(one)
答案 3 :(得分:0)
您也可以使用简单的setdefault()字典方法:
tbl=['1 Honda A 10/18/2017 14:45',
'1 Honda A 10/18/2017 17:10',
'3 Honda C 10/18/2017 19:35',
'4 Toyota B 10/18/2017 12:20',
'4 Toyota B 10/18/2017 14:45']
data={}
for line in tbl:
iden,car,code,show_date,time_available= line.split()
data.setdefault( (iden,car,code), {'id':iden,'schedule': {'car':car,'show_date':show_date,'time_available':[],'code':code}})['schedule']['time_available'].append(time_available);
我们使用(iden,car,code)元组作为字典键。如果字典中存在键,则“ setdefault”获取键的值;如果不存在,则创建键并将其放入默认值。我们使用空的“ time_available”列表创建默认结构,并且由于“ setdefault”返回现有或新创建的值,因此我们对该列表进行寻址并向其附加时间值。 结果:
data.values()
dict_values([{'id': '1', 'schedule': {'car': 'Honda', 'show_date': '10/18/2017', 'time_available': ['14:45', '17:10'], 'code': 'A'}}, {'id': '3', 'schedule': {'car': 'Honda', 'show_date': '10/18/2017', 'time_available': ['19:35'], 'code': 'C'}}, {'id': '4', 'schedule': {'car': 'Toyota', 'show_date': '10/18/2017', 'time_available': ['12:20', '14:45'], 'code': 'B'}}])