pandas join / merge'重新索引仅对具有唯一值的索引有效'

时间:2013-03-12 08:21:37

标签: join pandas

海,

我在连接操作上遇到非常强大的错误。我尝试了合并(left_index,right_index)以及相同的结果。

索引是相同的(按设计),由index.is_unique(TRUE)和index.get_duplicates()(EMPTY)检查两个索引。

基本版:

df1.join(series)
merge(df1, series_as_df, 

print tempres.index

[2013-01-14 17:04:45,...,2013-01-14 16:53:05] 长度:89,频率:无,时区:无


奇怪的是打印值:     print tempres.index.values [1970-01-16 121:04:45 1970-01-16 121:04:35 1970-01-16 121:04:25  1970-01-16 121:04:15 1970-01-16 121:04:05 1970-01-16 121:03:55  1970-01-16 121:03:45 1970-01-16 121:03:35 1970-01-16 121:03:25  1970-01-16 121:03:15 1970-01-16 121:03:05 1970-01-16 121:02:55  1970-01-16 121:02:45 1970-01-16 121:02:35 1970-01-16 121:02:25  1970-01-16 121:02:15 1970-01-16 121:02:05 1970-01-16 121:01:55  1970-01-16 121:01:45 1970-01-16 121:01:35 1970-01-16 121:01:25  ...]

如果需要,我可以添加腌制系列和df ......

使用最新的pandas版本0.10.x

谢谢,

吕克

我的代码(从较大的代码中删除)

XYTparams (existing dataframe)
prep_functions[funcname] = [list of values, same length as XYTparams]

iSeries = Series(prep_functions[funcname], index = XYTparams.index, name = funcname)
XYTparams = XYTparams.join(iSeries)

回顾我的问题:

我在基本DataFrame上使用merge和join连续。有一次,我在尝试下一次合并/加入时开始出错。我无法在一个简单的测试中重现这一点,但我在问题开始之前保存了数据帧。

我找不到问题所在。

base_df = load('SPOparams.pic')
lookup_df = load('lookup.pic')

print base_df
print lookup_df

print base_df.count()

print base_df['VKCSKEY1']
print lookup_df['traf_key']

# reset index does not change a thing
base_df = base_df.reset_index(drop=True)

print base_df.index
print base_df.index.get_duplicates()
print lookup_df.index
print lookup_df.index.get_duplicates()


# checking value matches
for k in lookup_df['traf_key']:
    print k, k in  base_df['VKCSKEY1'].values

# why does this merge is unsuccesfull ???
# in any combination of the parameters
df_result =merge(base_df, lookup_df, 
             how='left', 
             #how = 'outer',
             left_on ='VKCSKEY1', 
             right_on ='traf_key',
             #left_index=True, 
             #right_index = True,
             #sort=True, 
             #suffixes=('', '.m'), copy=True
             )
print df_result

输出:

1.6.1
0.10.1
<class 'pandas.core.frame.DataFrame'>
Int64Index: 89 entries, 0 to 88
Data columns:
T                        89  non-null values
X                        89  non-null values
Y                        89  non-null values
precip_quantity_1hour    89  non-null values
pressure                 89  non-null values
rel_humidity             89  non-null values
temp                     89  non-null values
temp_max                 0  non-null values
temp_min                 0  non-null values
wind_direction           89  non-null values
wind_speed               89  non-null values
BC_TRAF                  89  non-null values
closest                  89  non-null values
closest.m                89  non-null values
AGGP.P50_ID              89  non-null values
AGGP.FUNC_CLASS          89  non-null values
AGGP.SPEED_CAT           89  non-null values
LINK_ID                  89  non-null values
FUNC_CLASS               89  non-null values
SPEED_CAT                89  non-null values
AR_AUTO                  89  non-null values
AR_BUS                   89  non-null values
AR_TAXIS                 89  non-null values
AR_CARPOOL               89  non-null values
AR_PEDEST                89  non-null values
AR_TRUCKS                89  non-null values
STCA20_PCT               89  non-null values
VKC_LINKNR               89  non-null values
TRVIC150R1               89  non-null values
closest.m                89  non-null values
closest.m.m              89  non-null values
VKCP.LINK_ID             89  non-null values
VKCP.FUNC_CLASS          89  non-null values
VKCP.SPEED               89  non-null values
VKCP.LINKNR              89  non-null values
VKCP.TWIN_ID             89  non-null values
VKCSKEY1                 89  non-null values
dtypes: datetime64[ns](1), float64(13), int64(9), object(14)
<class 'pandas.core.frame.DataFrame'>
Index: 30 entries, (60744, 0) to (58314, 0)
Data columns:
traf_key      30  non-null values
weekday_nr    30  non-null values
linknr        30  non-null values
 weekday      30  non-null values
vr0           30  non-null values
vr1           30  non-null values
vr2           30  non-null values
vr3           30  non-null values
vr4           30  non-null values
vr5           30  non-null values
vr6           30  non-null values
vr7           30  non-null values
vr8           30  non-null values
vr9           30  non-null values
vr10          30  non-null values
vr11          30  non-null values
vr12          30  non-null values
vr13          30  non-null values
vr14          30  non-null values
vr15          30  non-null values
vr16          30  non-null values
vr17          30  non-null values
vr18          30  non-null values
vr19          30  non-null values
vr20          30  non-null values
vr21          30  non-null values
vr22          30  non-null values
vr23          30  non-null values
au0           30  non-null values
au1           30  non-null values
au2           30  non-null values
au3           30  non-null values
au4           30  non-null values
au5           30  non-null values
au6           30  non-null values
au7           30  non-null values
au8           30  non-null values
au9           30  non-null values
au10          30  non-null values
au11          30  non-null values
au12          30  non-null values
au13          30  non-null values
au14          30  non-null values
au15          30  non-null values
au16          30  non-null values
au17          30  non-null values
au18          30  non-null values
au19          30  non-null values
au20          30  non-null values
au21          30  non-null values
au22          30  non-null values
au23          30  non-null values
sn0           30  non-null values
sn1           30  non-null values
sn2           30  non-null values
sn3           30  non-null values
sn4           30  non-null values
sn5           30  non-null values
sn6           30  non-null values
sn7           30  non-null values
sn8           30  non-null values
sn9           30  non-null values
sn10          30  non-null values
sn11          30  non-null values
sn12          30  non-null values
sn13          30  non-null values
sn14          30  non-null values
sn15          30  non-null values
sn16          30  non-null values
sn17          30  non-null values
sn18          30  non-null values
sn19          30  non-null values
sn20          30  non-null values
sn21          30  non-null values
sn22          30  non-null values
sn23          30  non-null values
dtypes: float64(24), int64(50), object(2)
T                        89
X                        89
Y                        89
precip_quantity_1hour    89
pressure                 89
rel_humidity             89
temp                     89
temp_max                  0
temp_min                  0
wind_direction           89
wind_speed               89
BC_TRAF                  89
closest                  89
closest.m                89
AGGP.P50_ID              89
AGGP.FUNC_CLASS          89
AGGP.SPEED_CAT           89
LINK_ID                  89
FUNC_CLASS               89
SPEED_CAT                89
AR_AUTO                  89
AR_BUS                   89
AR_TAXIS                 89
AR_CARPOOL               89
AR_PEDEST                89
AR_TRUCKS                89
STCA20_PCT               89
VKC_LINKNR               89
TRVIC150R1               89
closest.m                89
closest.m.m              89
VKCP.LINK_ID             89
VKCP.FUNC_CLASS          89
VKCP.SPEED               89
VKCP.LINKNR              89
VKCP.TWIN_ID             89
VKCSKEY1                 89
0     (60744, 0)
1     (60744, 0)
2     (60744, 0)
3     (60750, 0)
4     (60768, 0)
5     (60768, 0)
6     (60758, 0)
7     (60758, 0)
8     (69223, 0)
9     (69223, 0)
10    (69223, 0)
11    (64265, 0)
12    (64265, 0)
13    (64265, 0)
14    (64265, 0)
15    (64265, 0)
16    (64265, 0)
17    (64265, 0)
18    (64265, 0)
19    (64265, 0)
20    (64216, 0)
21    (64216, 0)
22    (64216, 0)
23    (64216, 0)
24    (64216, 0)
25    (64216, 0)
26    (64216, 0)
27    (64216, 0)
28    (64216, 0)
29    (57085, 0)
30    (57085, 0)
31    (57085, 0)
32    (57085, 0)
33    (57085, 0)
34    (57085, 0)
35    (57014, 0)
36    (57033, 0)
37    (57033, 0)
38    (64065, 0)
39    (64065, 0)
40    (64065, 0)
41    (64065, 0)
42    (64065, 0)
43    (57070, 0)
44    (64062, 0)
45    (64062, 0)
46    (64062, 0)
47    (64062, 0)
48    (57070, 0)
49    (64061, 0)
50    (64061, 0)
51    (64061, 0)
52    (64061, 0)
53    (59849, 0)
54    (59415, 0)
55    (58487, 0)
56    (58054, 0)
57    (58054, 0)
58    (58054, 0)
59    (52551, 0)
60    (58054, 0)
61    (58054, 0)
62    (58054, 0)
63    (58054, 0)
64    (52551, 0)
65    (58054, 0)
66    (58488, 0)
67    (58488, 0)
68    (58028, 0)
69    (58464, 0)
70    (58028, 0)
71    (57989, 0)
72    (58595, 0)
73    (58027, 0)
74    (57989, 0)
75    (58595, 0)
76    (58595, 0)
77    (58019, 0)
78    (58595, 0)
79    (58595, 0)
80    (58019, 0)
81    (58595, 0)
82    (58595, 0)
83    (66715, 0)
84    (58595, 0)
85    (59295, 0)
86    (67614, 0)
87    (58314, 0)
88    (58314, 0)
Name: VKCSKEY1, Length: 89
VKCSKEY1
(60744, 0)    (60744, 0)
(60750, 0)    (60750, 0)
(60768, 0)    (60768, 0)
(60758, 0)    (60758, 0)
(69223, 0)    (69223, 0)
(64265, 0)    (64265, 0)
(64216, 0)    (64216, 0)
(57085, 0)    (57085, 0)
(57014, 0)    (57014, 0)
(57033, 0)    (57033, 0)
(64065, 0)    (64065, 0)
(57070, 0)    (57070, 0)
(64062, 0)    (64062, 0)
(64061, 0)    (64061, 0)
(59849, 0)    (59849, 0)
(59415, 0)    (59415, 0)
(58487, 0)    (58487, 0)
(58054, 0)    (58054, 0)
(52551, 0)    (52551, 0)
(58488, 0)    (58488, 0)
(58028, 0)    (58028, 0)
(58464, 0)    (58464, 0)
(57989, 0)    (57989, 0)
(58595, 0)    (58595, 0)
(58027, 0)    (58027, 0)
(58019, 0)    (58019, 0)
(66715, 0)    (66715, 0)
(59295, 0)    (59295, 0)
(67614, 0)    (67614, 0)
(58314, 0)    (58314, 0)
Name: traf_key
Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88], dtype=int64)
[]
Index([(60744, 0), (60750, 0), (60768, 0), (60758, 0), (69223, 0), (64265, 0), (64216, 0), (57085, 0), (57014, 0), (57033, 0), (64065, 0), (57070, 0), (64062, 0), (64061, 0), (59849, 0), (59415, 0), (58487, 0), (58054, 0), (52551, 0), (58488, 0), (58028, 0), (58464, 0), (57989, 0), (58595, 0), (58027, 0), (58019, 0), (66715, 0), (59295, 0), (67614, 0), (58314, 0)], dtype=object)
[]
(60744, 0) True
(60750, 0) True
(60768, 0) True
(60758, 0) True
(69223, 0) True
(64265, 0) True
(64216, 0) True
(57085, 0) True
(57014, 0) True
(57033, 0) True
(64065, 0) True
(57070, 0) True
(64062, 0) True
(64061, 0) True
(59849, 0) True
(59415, 0) True
(58487, 0) True
(58054, 0) True
(52551, 0) True
(58488, 0) True
(58028, 0) True
(58464, 0) True
(57989, 0) True
(58595, 0) True
(58027, 0) True
(58019, 0) True
(66715, 0) True
(59295, 0) True
(67614, 0) True
(58314, 0) True
Traceback (most recent call last):
  File "L:\temp\pandas_join_bug.py", line 43, in <module>
    right_on ='traf_key',
  File "C:\Python27\lib\site-packages\pandas\tools\merge.py", line 36, in merge
    return op.get_result()
  File "C:\Python27\lib\site-packages\pandas\tools\merge.py", line 185, in get_result
    ldata, rdata = self._get_merge_data()
  File "C:\Python27\lib\site-packages\pandas\tools\merge.py", line 277, in _get_merge_data
    copydata=False)
  File "C:\Python27\lib\site-packages\pandas\core\internals.py", line 1194, in _maybe_rename_join
    to_rename = self.items.intersection(other.items)
  File "C:\Python27\lib\site-packages\pandas\core\index.py", line 666, in intersection
    indexer = self.get_indexer(other.values)
  File "C:\Python27\lib\site-packages\pandas\core\index.py", line 812, in get_indexer
    raise Exception('Reindexing only valid with uniquely valued Index '
Exception: Reindexing only valid with uniquely valued Index objects

一旦发生错误,我无法获得任何merge或join语句成功。起初我没有看到错误与重复的合并/加入操作相关联。现在,最新一套的任何单一合并/加入都可以使用。一旦我需要另一个合并/加入,我得到相同的错误。苦苦挣扎了几天......

帮助!!!

吕克

1 个答案:

答案 0 :(得分:13)

重复的列名称将导致此错误,请尝试删除重复的列名称