我有两个数据框,其中一列名为time
,包含时间的日期时间表示和变量列。我想合并这两个数据帧,但由于某些原因,这会混淆nn
的日期时间格式。
我使用此代码创建单个数据帧:
## ECG load
nn = pd.read_csv('D:\\path\\Nn.csv',delimiter=";",decimal=',',header=None,names=["time","ibi"])
fsEcg = 1024 # Sample frequency
tsEcg = mkdatMovis('2016-10-31T12:16:15.015') #datetime rep of Start time string
nn.loc[:,'time'] = nn.time/fsEcg # convert sample number to seconds
ecgTime = zip(tsEcg + datetime.timedelta(seconds=float(cmt)) for cmt in nn.time)
nn.loc[:,'time'] = ecgTime
## EDA load
eda = pd.read_csv('D:\\path\\eda.csv',\
delimiter=";",decimal=',',header=None,names=["eda"])
fsEda = 32
tsEda = mkdatMovis('2016-10-31T12:17:08.363')
cumEda = np.arange(len(eda),dtype=np.float64)/fsEda # create time array in seconds
cumEda = pd.Series(cumEda)
edadat = pd.DataFrame()
edadat.loc[:,'time'] = zip(tsEda + datetime.timedelta(seconds=float(cmt)) for cmt in cumEda)
edadat.loc[:,'eda'] = eda
数据框如下:
>>> nn
time nn
0 2016-10-31 12:16:26.409531 972.656250
1 2016-10-31 12:16:27.394883 985.351562
2 2016-10-31 12:16:28.379258 984.375000
3 2016-10-31 12:16:29.360703 981.445312
4 2016-10-31 12:16:30.407578 1046.875000
...
1448 2016-10-31 12:39:37.910508 845.703125
>>> edadat
time eda
0 (2016-10-31 12:17:08.363000,) 2.0
1 (2016-10-31 12:17:08.363000,) 5.0
2 (2016-10-31 12:17:08.363000,) 5.0
3 (2016-10-31 12:17:08.363000,) 4.0
4 (2016-10-31 12:17:08.363000,) 4.0
....
41582 (2016-10-31 12:38:47.363000,) 36.0
将数据框与df = edadat.merge(nn,on="time",how="outer")
合并后,数据如下所示:
time eda nn
0 (2016-10-31 12:17:08.363000,) 2.0 NaN
1 (2016-10-31 12:17:08.363000,) 5.0 NaN
2 (2016-10-31 12:17:08.363000,) 5.0 NaN
3 (2016-10-31 12:17:08.363000,) 4.0 NaN
4 (2016-10-31 12:17:08.363000,) 4.0 NaN
...
43027 1477917574356797000 NaN 928.710938
43028 1477917575276719000 NaN 919.921875
43029 1477917576178086000 NaN 901.367188
43030 1477917577064805000 NaN 886.718750
43031 1477917577910508000 NaN 845.703125
为什么合并后日期时间表格nn
会转换为unix?我是否使用完全相同的代码来创建时间序列?
答案 0 :(得分:1)
我认为您在tuples
列中有time
存在问题,因此您需要按str[0]
删除元组 - 在DataFrame
行中的每个元组中选择第一个元素:
edadat.time = edadat.time.str[0]
print (edadat)
time eda
0 2016-10-31 12:17:08.363000 2.0
1 2016-10-31 12:17:08.363000 5.0
2 2016-10-31 12:17:08.363000 5.0
3 2016-10-31 12:17:08.363000 4.0
4 2016-10-31 12:17:08.363000 4.0
41582 2016-10-31 12:38:47.363000 36.0
然后使用:
df = edadat.merge(nn,on="time",how="outer")
print (df)
time eda nn
0 2016-10-31 12:17:08.363000 2.0 NaN
1 2016-10-31 12:17:08.363000 5.0 NaN
2 2016-10-31 12:17:08.363000 5.0 NaN
3 2016-10-31 12:17:08.363000 4.0 NaN
4 2016-10-31 12:17:08.363000 4.0 NaN
5 2016-10-31 12:38:47.363000 36.0 NaN
6 2016-10-31 12:16:26.409531 NaN 972.656250
7 2016-10-31 12:16:27.394883 NaN 985.351562
8 2016-10-31 12:16:28.379258 NaN 984.375000
9 2016-10-31 12:16:29.360703 NaN 981.445312
10 2016-10-31 12:16:30.407578 NaN 1046.875000
11 2016-10-31 12:39:37.910508 NaN 845.703125
但我认为更好的是使用merge_ordered
:
df1 = pd.merge_ordered(edadat, nn,on="time",how="outer")
print (df1)
time eda nn
0 2016-10-31 12:16:26.409531 NaN 972.656250
1 2016-10-31 12:16:27.394883 NaN 985.351562
2 2016-10-31 12:16:28.379258 NaN 984.375000
3 2016-10-31 12:16:29.360703 NaN 981.445312
4 2016-10-31 12:16:30.407578 NaN 1046.875000
5 2016-10-31 12:17:08.363000 2.0 NaN
6 2016-10-31 12:17:08.363000 5.0 NaN
7 2016-10-31 12:17:08.363000 5.0 NaN
8 2016-10-31 12:17:08.363000 4.0 NaN
9 2016-10-31 12:17:08.363000 4.0 NaN
10 2016-10-31 12:38:47.363000 36.0 NaN
11 2016-10-31 12:39:37.910508 NaN 845.703125