duplicated())对于所有内容都返回false,即使只有index例外,它们都是相同的。
输入
old_data = old_data.loc[:, ~old_data.columns.str.contains('^Unnamed')]
print("bottom_slice")
bottom_slice_length = len(old_data.index)
adjusted_bottom_slice_legth = bottom_slice_length * 0.1
adjusted_bottom_slice_legth = int(adjusted_bottom_slice_legth)
bottom_slice = old_data[adjusted_bottom_slice_legth:]
print(bottom_slice)
new_data = pd.DataFrame.from_records(journal.data)
top_slice_length = len(new_data.index)
print("top slice")
adjusted_top_slice_legth = top_slice_length * 0.9
adjusted_top_slice_legth = int(adjusted_top_slice_legth)
top_slice = new_data[:adjusted_top_slice_legth]
print(top_slice)
kimera = pd.concat([top_slice, bottom_slice])
#print("kimera")
#print(kimera)
print(kimera.duplicated())
#kimera = kimera.drop_duplicates()
print("kimera1")
print(kimera)
输出
bottom_slice
client_id date ... type_id unit_price
4 94904480 2019-06-30T01:31:01+00:00 ... 11186 37177999.84
5 2113704258 2019-06-29T10:46:53+00:00 ... 12044 33996998.00
6 2115385566 2019-06-27T12:07:58+00:00 ... 11393 44899999.98
7 1732767131 2019-06-27T09:22:24+00:00 ... 38 325.24
8 93204128 2019-06-26T20:47:01+00:00 ... 11198 35999999.98
9 90216786 2019-06-25T23:51:48+00:00 ... 11172 35999999.99
10 91205905 2019-06-25T19:59:21+00:00 ... 16275 600.00
11 2113996003 2019-06-25T16:52:14+00:00 ... 11190 39999999.96
12 96345205 2019-06-25T16:39:49+00:00 ... 16275 600.00
13 95103814 2019-06-25T01:16:28+00:00 ... 11202 29999998.93
14 543983309 2019-06-24T14:05:49+00:00 ... 11172 27415377.17
15 2114159703 2019-06-23T21:20:04+00:00 ... 34 6.30
16 2114159703 2019-06-23T15:28:37+00:00 ... 16274 850.00
17 1872130440 2019-06-23T10:02:21+00:00 ... 11400 38498999.98
18 2112790910 2019-06-23T00:00:46+00:00 ... 11202 28394499.36
19 2115326382 2019-06-22T22:42:00+00:00 ... 11371 37150194.88
20 96768321 2019-06-22T17:02:14+00:00 ... 37481 88999999.99
21 1009077082 2019-06-21T23:35:03+00:00 ... 11379 42000000.00
22 755876330 2019-06-21T12:27:59+00:00 ... 11186 37177999.86
23 1556713165 2019-06-20T23:27:23+00:00 ... 11393 36997999.87
24 513171897 2019-06-19T15:58:51+00:00 ... 11381 43817993.86
25 96711003 2019-06-18T17:50:15+00:00 ... 11198 36999999.99
26 408059764 2019-06-18T15:36:49+00:00 ... 11172 35000000.00
27 1276544138 2019-06-17T21:32:47+00:00 ... 11379 41000000.00
28 94184713 2019-06-17T03:30:26+00:00 ... 37481 86999999.99
29 2113441660 2019-06-16T04:12:59+00:00 ... 37458 34948998.99
30 755284989 2019-06-15T19:54:44+00:00 ... 37458 34999999.97
31 1731319339 2019-06-13T12:00:14+00:00 ... 11379 42000000.00
32 96053157 2019-06-12T04:07:15+00:00 ... 37483 85500002.17
33 1690931127 2019-06-12T00:44:40+00:00 ... 37482 61699999.97
34 92812153 2019-06-11T05:23:09+00:00 ... 37460 36499999.99
35 2114791711 2019-06-10T16:14:59+00:00 ... 11371 41499999.99
36 1547875730 2019-06-10T15:22:53+00:00 ... 17887 999.99
37 227535700 2019-06-10T15:12:06+00:00 ... 16272 544.50
38 95165645 2019-06-10T06:32:52+00:00 ... 11393 53989999.99
39 1859791498 2019-06-10T05:35:57+00:00 ... 22460 62000000.00
40 2112629749 2019-06-09T15:46:46+00:00 ... 2549 1800000.00
41 94391975 2019-06-08T00:06:12+00:00 ... 37460 36499999.99
42 91521700 2019-06-07T14:11:45+00:00 ... 11393 49997999.98
43 1171184159 2019-06-06T18:10:19+00:00 ... 12044 33997997.81
44 96410073 2019-06-05T17:32:01+00:00 ... 11371 46999999.96
[41 rows x 10 columns]
top slice
client_id date ... type_id unit_price
0 96644839 2019-07-07T02:02:45+00:00 ... 37457 2.900000e+07
1 2113806433 2019-07-06T18:13:12+00:00 ... 37482 7.300000e+07
2 1240358507 2019-07-05T19:38:20+00:00 ... 11381 4.399900e+07
3 97005654 2019-07-05T04:12:23+00:00 ... 38 3.999900e+02
4 97005654 2019-07-05T02:49:26+00:00 ... 38 3.999900e+02
5 1857838543 2019-07-03T20:08:15+00:00 ... 37482 6.900000e+07
6 92337897 2019-07-03T14:44:32+00:00 ... 11365 4.480000e+07
7 2114793091 2019-07-01T23:04:26+00:00 ... 12044 3.000000e+07
8 95826459 2019-06-30T07:22:45+00:00 ... 37482 1.190000e+08
9 94904480 2019-06-30T01:31:01+00:00 ... 11186 3.717800e+07
10 2113704258 2019-06-29T10:46:53+00:00 ... 12044 3.399700e+07
11 2115385566 2019-06-27T12:07:58+00:00 ... 11393 4.490000e+07
12 1732767131 2019-06-27T09:22:24+00:00 ... 38 3.252400e+02
13 93204128 2019-06-26T20:47:01+00:00 ... 11198 3.600000e+07
14 90216786 2019-06-25T23:51:48+00:00 ... 11172 3.600000e+07
15 91205905 2019-06-25T19:59:21+00:00 ... 16275 6.000000e+02
16 2113996003 2019-06-25T16:52:14+00:00 ... 11190 4.000000e+07
17 96345205 2019-06-25T16:39:49+00:00 ... 16275 6.000000e+02
18 95103814 2019-06-25T01:16:28+00:00 ... 11202 3.000000e+07
19 543983309 2019-06-24T14:05:49+00:00 ... 11172 2.741538e+07
20 2114159703 2019-06-23T21:20:04+00:00 ... 34 6.300000e+00
21 2114159703 2019-06-23T15:28:37+00:00 ... 16274 8.500000e+02
22 1872130440 2019-06-23T10:02:21+00:00 ... 11400 3.849900e+07
23 2112790910 2019-06-23T00:00:46+00:00 ... 11202 2.839450e+07
24 2115326382 2019-06-22T22:42:00+00:00 ... 11371 3.715019e+07
25 96768321 2019-06-22T17:02:14+00:00 ... 37481 8.900000e+07
26 1009077082 2019-06-21T23:35:03+00:00 ... 11379 4.200000e+07
27 755876330 2019-06-21T12:27:59+00:00 ... 11186 3.717800e+07
28 1556713165 2019-06-20T23:27:23+00:00 ... 11393 3.699800e+07
29 513171897 2019-06-19T15:58:51+00:00 ... 11381 4.381799e+07
30 96711003 2019-06-18T17:50:15+00:00 ... 11198 3.700000e+07
31 408059764 2019-06-18T15:36:49+00:00 ... 11172 3.500000e+07
32 1276544138 2019-06-17T21:32:47+00:00 ... 11379 4.100000e+07
33 94184713 2019-06-17T03:30:26+00:00 ... 37481 8.700000e+07
34 2113441660 2019-06-16T04:12:59+00:00 ... 37458 3.494900e+07
35 755284989 2019-06-15T19:54:44+00:00 ... 37458 3.500000e+07
36 1731319339 2019-06-13T12:00:14+00:00 ... 11379 4.200000e+07
37 96053157 2019-06-12T04:07:15+00:00 ... 37483 8.550000e+07
38 1690931127 2019-06-12T00:44:40+00:00 ... 37482 6.170000e+07
39 92812153 2019-06-11T05:23:09+00:00 ... 37460 3.650000e+07
40 2114791711 2019-06-10T16:14:59+00:00 ... 11371 4.150000e+07
41 1547875730 2019-06-10T15:22:53+00:00 ... 17887 9.999900e+02
[42 rows x 10 columns]
0 False
1 False
2 False
3 False
4 False
5 False
6 False
7 False
8 False
9 False
10 False
11 False
12 False
13 False
14 False
15 False
16 False
17 False
18 False
19 False
20 False
21 False
22 False
23 False
24 False
25 False
26 False
27 False
28 False
29 False
...
15 False
16 False
17 False
18 False
19 False
20 False
21 False
22 False
23 False
24 False
25 False
26 False
27 False
28 False
29 False
30 False
31 False
32 False
33 False
34 False
35 False
36 False
37 False
38 False
39 False
40 False
41 False
42 False
43 False
44 False
Length: 83, dtype: bool
kimera1
client_id date ... type_id unit_price
0 96644839 2019-07-07T02:02:45+00:00 ... 37457 2.900000e+07
1 2113806433 2019-07-06T18:13:12+00:00 ... 37482 7.300000e+07
2 1240358507 2019-07-05T19:38:20+00:00 ... 11381 4.399900e+07
3 97005654 2019-07-05T04:12:23+00:00 ... 38 3.999900e+02
4 97005654 2019-07-05T02:49:26+00:00 ... 38 3.999900e+02
5 1857838543 2019-07-03T20:08:15+00:00 ... 37482 6.900000e+07
6 92337897 2019-07-03T14:44:32+00:00 ... 11365 4.480000e+07
7 2114793091 2019-07-01T23:04:26+00:00 ... 12044 3.000000e+07
8 95826459 2019-06-30T07:22:45+00:00 ... 37482 1.190000e+08
9 94904480 2019-06-30T01:31:01+00:00 ... 11186 3.717800e+07
10 2113704258 2019-06-29T10:46:53+00:00 ... 12044 3.399700e+07
11 2115385566 2019-06-27T12:07:58+00:00 ... 11393 4.490000e+07
12 1732767131 2019-06-27T09:22:24+00:00 ... 38 3.252400e+02
13 93204128 2019-06-26T20:47:01+00:00 ... 11198 3.600000e+07
14 90216786 2019-06-25T23:51:48+00:00 ... 11172 3.600000e+07
15 91205905 2019-06-25T19:59:21+00:00 ... 16275 6.000000e+02
16 2113996003 2019-06-25T16:52:14+00:00 ... 11190 4.000000e+07
17 96345205 2019-06-25T16:39:49+00:00 ... 16275 6.000000e+02
18 95103814 2019-06-25T01:16:28+00:00 ... 11202 3.000000e+07
19 543983309 2019-06-24T14:05:49+00:00 ... 11172 2.741538e+07
20 2114159703 2019-06-23T21:20:04+00:00 ... 34 6.300000e+00
21 2114159703 2019-06-23T15:28:37+00:00 ... 16274 8.500000e+02
22 1872130440 2019-06-23T10:02:21+00:00 ... 11400 3.849900e+07
23 2112790910 2019-06-23T00:00:46+00:00 ... 11202 2.839450e+07
24 2115326382 2019-06-22T22:42:00+00:00 ... 11371 3.715019e+07
25 96768321 2019-06-22T17:02:14+00:00 ... 37481 8.900000e+07
26 1009077082 2019-06-21T23:35:03+00:00 ... 11379 4.200000e+07
27 755876330 2019-06-21T12:27:59+00:00 ... 11186 3.717800e+07
28 1556713165 2019-06-20T23:27:23+00:00 ... 11393 3.699800e+07
29 513171897 2019-06-19T15:58:51+00:00 ... 11381 4.381799e+07
.. ... ... ... ... ...
15 2114159703 2019-06-23T21:20:04+00:00 ... 34 6.300000e+00
16 2114159703 2019-06-23T15:28:37+00:00 ... 16274 8.500000e+02
17 1872130440 2019-06-23T10:02:21+00:00 ... 11400 3.849900e+07
18 2112790910 2019-06-23T00:00:46+00:00 ... 11202 2.839450e+07
19 2115326382 2019-06-22T22:42:00+00:00 ... 11371 3.715019e+07
20 96768321 2019-06-22T17:02:14+00:00 ... 37481 8.900000e+07
21 1009077082 2019-06-21T23:35:03+00:00 ... 11379 4.200000e+07
22 755876330 2019-06-21T12:27:59+00:00 ... 11186 3.717800e+07
23 1556713165 2019-06-20T23:27:23+00:00 ... 11393 3.699800e+07
24 513171897 2019-06-19T15:58:51+00:00 ... 11381 4.381799e+07
25 96711003 2019-06-18T17:50:15+00:00 ... 11198 3.700000e+07
26 408059764 2019-06-18T15:36:49+00:00 ... 11172 3.500000e+07
27 1276544138 2019-06-17T21:32:47+00:00 ... 11379 4.100000e+07
28 94184713 2019-06-17T03:30:26+00:00 ... 37481 8.700000e+07
29 2113441660 2019-06-16T04:12:59+00:00 ... 37458 3.494900e+07
30 755284989 2019-06-15T19:54:44+00:00 ... 37458 3.500000e+07
31 1731319339 2019-06-13T12:00:14+00:00 ... 11379 4.200000e+07
32 96053157 2019-06-12T04:07:15+00:00 ... 37483 8.550000e+07
33 1690931127 2019-06-12T00:44:40+00:00 ... 37482 6.170000e+07
34 92812153 2019-06-11T05:23:09+00:00 ... 37460 3.650000e+07
35 2114791711 2019-06-10T16:14:59+00:00 ... 11371 4.150000e+07
36 1547875730 2019-06-10T15:22:53+00:00 ... 17887 9.999900e+02
37 227535700 2019-06-10T15:12:06+00:00 ... 16272 5.445000e+02
38 95165645 2019-06-10T06:32:52+00:00 ... 11393 5.399000e+07
39 1859791498 2019-06-10T05:35:57+00:00 ... 22460 6.200000e+07
40 2112629749 2019-06-09T15:46:46+00:00 ... 2549 1.800000e+06
41 94391975 2019-06-08T00:06:12+00:00 ... 37460 3.650000e+07
42 91521700 2019-06-07T14:11:45+00:00 ... 11393 4.999800e+07
43 1171184159 2019-06-06T18:10:19+00:00 ... 12044 3.399800e+07
44 96410073 2019-06-05T17:32:01+00:00 ... 11371 4.700000e+07
[83 rows x 10 columns]
我希望合并两个不同的数据框,以消除重复的数据,如果它们在保存到现在的日期时顺序混乱,我希望我可以使用它们。 但是目前我无法消除任何重复项。
答案 0 :(得分:0)
选择要比较的列。例如,如果您不关心client_id是否与众不同,则不理会它。我会这样:
#Choose all columns but "client_id"
cols_to_compare = list(kimera1.columns.difference(["client_id"]))
#Drop rows based on subset of your choice
kimera1.drop_duplicates(subset=cols_to_compare, keep='first', inplace=True)
这对您有用吗?