我有两个数据集。我想使用索引进行合并。
第一个数据集:
index A B C
01/01/2010 15 20 30
15/01/2010 12 15 25
17/02/2010 14 13 35
19/02/2010 11 10 22
2nt数据集:
index year month price
0 2010 january 70
1 2010 february 80
我希望他们像这样加入:
index A B C price
01/01/2010 15 20 30 70
15/01/2010 12 15 25 70
17/02/2010 14 13 35 80
19/02/2010 11 10 22 80
问题在于如何使用两列(第二数据集的year
和month
)创建临时日期时间index
。
答案 0 :(得分:2)
尝试此操作,方法是从.dt.year
中提取.month_name()和year(df1
)并将其与df2
合并
>>> df1
index A B C
0 01/01/2010 15 20 30
1 15/01/2010 12 15 25
2 17/02/2010 14 13 35
3 19/02/2010 11 10 22
>>> df2
index year month price
0 0 2010 january 70
1 1 2010 february 80
# merging df1 and df2 by month and year.
>>> df1.merge(df2,
left_on = [pd.to_datetime(df1['index']).dt.year,
pd.to_datetime(df1['index']).dt.month_name().str.lower()],
right_on = ['year', 'month'])
输出:
index_x A B C index_y year month price
0 01/01/2010 15 20 30 0 2010 january 70
1 15/01/2010 12 15 25 0 2010 january 70
2 17/02/2010 14 13 35 1 2010 february 80
3 19/02/2010 11 10 22 1 2010 february 80
答案 1 :(得分:0)
这是愚蠢的答案!我敢肯定,您可以做得更好:)但这可行,因为您的表是一个字典列表(您可以轻松地以这种格式转换SQL表)。我知道这不是一个干净的解决方案,但是您要求一个简单的解决方案,可能是最容易理解的方法:)
months = {'january': "01",
'february': "02",
'march': "03",
'april':"04",
'may': "05",
'june': "06",
'july': "07",
'august': "08",
'september': "09",
'october': "10",
'november': "11",
'december': "12"}
table1 = [{'index': '01/01/2010', 'A': 15, 'B': 20, 'C': 30},
{'index': '15/01/2010', 'A': 12, 'B': 15, 'C': 25},
{'index': '17/02/2010', 'A': 14, 'B': 13, 'C': 35},
{'index': '19/02/2010', 'A': 11, 'B': 10, 'C': 22}]
table2 = [{'index': 0, 'year': 2010, 'month': 'january', 'price':70},
{'index': 1, 'year': 2010, 'month': 'february', 'price':80}]
def joiner(table1, table2):
for row in table2:
row['tempDate'] = "{0}/{1}".format(months[row['month']], str(row['year']))
for row in table1:
row['tempDate'] = row['index'][3:]
table3 = []
for row1 in table1:
row3 = row1.copy()
for row2 in table2:
if row2['tempDate'] == row1['tempDate']:
row3['price'] = row2['price']
break
table3.append(row3)
return(table3)
table3 = joiner(table1, table2)
print(table3)