在这里,我只是摆出了引发错误的代码部分。在这里,我要概括两个不同的数据帧集,这些数据帧将追加到两个不同的列表中。
path1 = '/home/Desktop/computed_2d_blaze/'
path2 = '/home/Desktop/computed_1d/'
path3 = '/home/Desktop/sn_airmass_seeing/'
dir1 = [x for x in os.listdir(path1) if '.ares' in x]
dir2 = [x for x in os.listdir(path2) if '.ares' in x]
dir3 = [x for x in os.listdir(path3) if '.ares' in x]
lst = []
lst1 = []
for file1, file2,file3 in zip(dir1,dir2,dir3):
df1 = pd.read_table(path1+file1, skiprows=0, usecols=(0,1,2,3,4,8),names=['wave','num','stlines','fwhm','EWs','MeasredWave'],delimiter=r'\s+')
df2 = pd.read_table(path2+file2, skiprows=0, usecols=(0,1,2,3,4,8),names=['wave','num','stlines','fwhm','EWs','MeasredWave'],delimiter=r'\s+')
df1 = df1.groupby('wave').mean().reset_index()
df1 = df1.sort_values('wave').reset_index(drop=True)
df2 = df2.sort_values('wave').reset_index(drop=True)
dfs = pd.merge(df1,df2, on='wave', how='inner')
dfs['delta_ew'] = (dfs.EWs_x - dfs.EWs_y)
dfs=dfs.filter(items=['wave','delta_ew'])
lst.append(dfs)
df3 = pd.read_table(path3+file3, skiprows=0, usecols=(0,1,2),names=['seeing','airmass','snr'],delimiter=r'\s+')
lst1.append(df3)
[df.set_index('wave', inplace=True) for df in lst]
df=pd.concat(lst,axis=1,join='inner')
x = pd.concat(lst1,axis=1,join='inner')
for z in df.index:
t = x.loc[0, 'airmass']
s = df.loc[z, 'delta_ew']
dfs = pd.concat([s,t],axis=1,names=['delta_ew','airmass'])
dfs = dfs[np.abs(dfs.delta_ew - dfs.delta_ews.mean()) <= (dfs.delta_ews.mad())]
由于我试图创建一个新的数据框,因为delta_ew
中有一些异常值,因此为了删除它们,我正在这样做。但是,当尝试执行此操作时,出现此错误 ValueError: cannot reindex from a duplicate axis
。
我不知道如何解决此错误。谁能告诉我我在哪里犯错?
完整的追溯
Traceback (most recent call last):
File "/home/gyanender/Desktop/r_values/airmass_vs_ew/delta_ew/for_rvalues.py", line 72, in <module>
dfs = pd.concat([s,t],axis=1,names=['delta_ew','airmass'])
File "/home/gyanender/.local/lib/python2.7/site-packages/pandas/core/reshape/concat.py", line 213, in concat
return op.get_result()
File "/home/gyanender/.local/lib/python2.7/site-packages/pandas/core/reshape/concat.py", line 385, in get_result
df = cons(data, index=index)
File "/home/gyanender/.local/lib/python2.7/site-packages/pandas/core/frame.py", line 330, in __init__
mgr = self._init_dict(data, index, columns, dtype=dtype)
File "/home/gyanender/.local/lib/python2.7/site-packages/pandas/core/frame.py", line 461, in _init_dict
return _arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
File "/home/gyanender/.local/lib/python2.7/site-packages/pandas/core/frame.py", line 6168, in _arrays_to_mgr
arrays = _homogenize(arrays, index, dtype)
File "/home/gyanender/.local/lib/python2.7/site-packages/pandas/core/frame.py", line 6465, in _homogenize
v = v.reindex(index, copy=False)
File "/home/gyanender/.local/lib/python2.7/site-packages/pandas/core/series.py", line 2681, in reindex
return super(Series, self).reindex(index=index, **kwargs)
File "/home/gyanender/.local/lib/python2.7/site-packages/pandas/core/generic.py", line 3023, in reindex
fill_value, copy).__finalize__(self)
File "/home/gyanender/.local/lib/python2.7/site-packages/pandas/core/generic.py", line 3041, in _reindex_axes
copy=copy, allow_dups=False)
File "/home/gyanender/.local/lib/python2.7/site-packages/pandas/core/generic.py", line 3145, in _reindex_with_indexers
copy=copy)
File "/home/gyanender/.local/lib/python2.7/site-packages/pandas/core/internals.py", line 4139, in reindex_indexer
self.axes[axis]._can_reindex(indexer)
File "/home/gyanender/.local/lib/python2.7/site-packages/pandas/core/indexes/base.py", line 2944, in _can_reindex
raise ValueError("cannot reindex from a duplicate axis")
ValueError: cannot reindex from a duplicate axis
答案 0 :(得分:1)
我终于设法解决了这个问题。我使用concat
而不是dictionary
。由于我面临的问题是要创建两个熊猫系列以制作新的数据框。我首先将pandas系列t & s
的值转换为字典,然后将该字典转换为数据框,对我来说它工作得很好。
for z in df.index:
t = x.loc[0, 'airmass']
t = t.values
s = df.loc[z, 'delta_ew']
s = s.values
dic = dict(zip(s,t))
q = pd.DataFrame(dic.items(), columns=['ew', 'airmass'])
q = q[np.abs(q.ew - q.ew.mean()) <= (q.ew.mad())]
答案 1 :(得分:-1)
当索引具有重复值的列中加入/分配到列时,通常会出现此错误。
由dfs = pd.concat([s,t],axis=1,names=['delta_ew','airmass'])
代码引发错误。我相信我找到了解决您问题的方法。只需将ignore_index=True
添加到concat
代码中即可。
赞:
dfs = pd.concat([s,t],axis=1,names=['delta_ew','airmass'], ignore_index=True )
将重新创建索引。
注意:index
表示行和列的名称