我有两个数据框,想要比较它们并删除df2中与df1不同的天数。我试着用:
df2[~df2.Date.isin(df1.Date)]
但这不起作用并得到一个空数据帧。 df2应该看起来像df1。数据框如下所示:
df1
Date
0 20-12-16
1 21-12-16
2 22-12-16
3 23-12-16
4 27-12-16
5 28-12-16
6 29-12-16
7 30-12-16
8 02-01-17
9 03-01-17
10 04-01-17
11 05-01-17
12 06-01-17
df2
Date
0 20-12-16
1 21-12-16
2 22-12-16
3 23-12-16
4 24-12-16
5 25-12-16
6 26-12-16
7 27-12-16
8 28-12-16
9 29-12-16
10 30-12-16
11 31-12-16
12 01-01-17
13 02-01-17
14 03-01-17
15 04-01-17
16 05-01-17
17 06-01-17
答案 0 :(得分:3)
似乎print (df1.Date.dtype)
print (df2.Date.dtype)
不同。比较需要相同。
检查:
df1['Date'] = pd.to_datetime(df1['Date'])
df2['Date'] = pd.to_datetime(df2['Date'])
然后根据需要进行转换:
df = df2[np.in1d(df2.Date, df1.Date)]
print (df)
Date
0 2016-12-20
1 2016-12-21
2 2016-12-22
3 2016-12-23
7 2016-12-27
8 2016-12-28
9 2016-12-29
10 2016-12-30
13 2017-01-02
14 2017-01-03
15 2017-01-04
16 2017-01-05
17 2017-01-06
我添加了另外两个解决方案 - 首先是numpy.in1d
,第二个是merge
,因为需要默认的内部联接:
df = df1.merge(df2, on='Date')
print (df)
Date
0 2016-12-20
1 2016-12-21
2 2016-12-22
3 2016-12-23
7 2016-12-27
8 2016-12-28
9 2016-12-29
10 2016-12-30
13 2017-01-02
14 2017-01-03
15 2017-01-04
16 2017-01-05
17 2017-01-06
d1 = {'Date': ['20-12-16', '21-12-16', '22-12-16', '23-12-16', '27-12-16', '28-12-16', '29-12-16', '30-12-16', '02-01-17', '03-01-17', '04-01-17', '05-01-17', '06-01-17']}
d2 = {'Date': ['20-12-16', '21-12-16', '22-12-16', '23-12-16', '24-12-16', '25-12-16', '26-12-16', '27-12-16', '28-12-16', '29-12-16', '30-12-16', '31-12-16', '01-01-17', '02-01-17', '03-01-17', '04-01-17', '05-01-17', '06-01-17']}
df1 = pd.DataFrame(d1)
df2 = pd.DataFrame(d2)
样品:
print (df1.Date.dtype)
object
print (df2.Date.dtype)
object
df1['Date'] = pd.to_datetime(df1['Date'], format='%d-%m-%y')
df2['Date'] = pd.to_datetime(df2['Date'], format='%d-%m-%y')
{{1}}
答案 1 :(得分:0)
你的错误来自逻辑。您想要选择df2日期为df1。所以你应该写
df2[df2.Date.isin(df1.Date)]
与df1中的比较/包含为真的布尔值相反
你也可以用
获得相同的结果set(b.Date)-(set(b.Date)-set(a.Date))
然后应该通过以下方式使用:
pd.DataFrame(sorted((set(b.Date)-(set(b.Date)-set(a.Date)))), columns=["Date"] )
虽然排序不是最佳的,你可以用更好的逻辑在熊猫中改变它。
df = pd.DataFrame(list((set(b.Date)-(set(b.Date)-set(a.Date)))), columns=["Date"] )
df.Date = [date.date() for date in df.Date]
或 df.Date.dt.date
(见How do I convert dates in a Pandas data frame to a 'date' data type?)