我正在学习Python(2.7)并尝试离开加入两个pandas数据帧。 一个数据框具有产品的日期和相应的销售额,而另一个数据框具有日期和一周中的相应日期。
print type(weekdaytrain)
print weekdaytrainhead(5)
<class 'pandas.core.frame.DataFrame'>
data giorno_settimana
0 2014-09-01 0
1 2014-09-02 1
2 2014-09-03 2
3 2014-09-04 3
4 2014-09-05 4
print type(train)
print train.head(5)
<class 'pandas.core.frame.DataFrame'>
data pezzi
1078 2014-09-01 1743
1086 2014-09-02 1483
1094 2014-09-03 1510
1102 2014-09-04 1276
1110 2014-09-05 1741
当我这样做时:
new_train = pd.merge(train,weekdaytrain, on='data',how='left')
或
new_train = pd.merge(train,weekdaytrain, left_on='data',right_on='data',how='left')
我明白了:
data pezzi giorno_settimana
0 2014-09-01 1743 NaN
1 2014-09-02 1483 NaN
2 2014-09-03 1510 NaN
3 2014-09-04 1276 NaN
4 2014-09-05 1741 NaN
即使日期确实对应。 我搜索了答案,但没有什么适合我的问题,你能帮帮我吗?
谢谢!
答案 0 :(得分:1)
我认为您需要在datetime
中将列转换为Dataframes
,因为它似乎有不同的dtypes
- 一个是datetime
,一个是object
(显然是string
):
weekdaytrain.data = pd.to_datetime(weekdaytrain.data)
train.data = pd.to_datetime(train.data)
print (weekdaytrain.dtypes)
data datetime64[ns]
giorno_settimana int64
dtype: object
print (train.dtypes)
data object
pezzi int64
dtype: object
new_train = pd.merge(train,weekdaytrain, on='data',how='left')
print (new_train)
data pezzi giorno_settimana
0 2014-09-01 1743 NaN
1 2014-09-02 1483 NaN
2 2014-09-03 1510 NaN
3 2014-09-04 1276 NaN
4 2014-09-05 1741 NaN
#column in train is not datetime, so need converting
train.data = pd.to_datetime(train.data)
new_train = pd.merge(train,weekdaytrain, on='data',how='left')
print (new_train)
data pezzi giorno_settimana
0 2014-09-01 1743 0
1 2014-09-02 1483 1
2 2014-09-03 1510 2
3 2014-09-04 1276 3
4 2014-09-05 1741 4