Python Pandas InvalidIndexError

时间:2014-02-12 04:05:01

标签: python parsing python-2.7 csv pandas

我正在尝试合并一些由于某些大文本文件而获得的CSV文件。格式是这样的:

Key, Value
002 C000000,NY
002 D010000,11788
003  000000,N
004  000000,Y
005  000000,N
006  000000,N
007C0100,5
007C0200,XTF ADVISORS TRUST - ETF 2030 PORTFOLIO
007C0300,Y
0240000,N
025D0001,0
025D0002,0
025D0003,0
025D0004,0
025D0005,0
025D0006,0
025D0007,0
025D0008,0
028A0100,10
028A0200,0
028A0300,0
028A0400,5
028B0100,28
028B0200,0
028B0300,0
028B0400,54
028C0100,1
028C0200,0
028C0300,0
028C0400,6
028D0100,20
028D0200,0
028D0300,0

我有一个hacky方法,我不确定问题是否与我的方法(内存?)或不是,但这里是将所有这些与键,值在同一行中合并的代码:

def mergeCsv(filename1,seriesnum):
    firstfile = pd.read_csv(filename1 + "_1.csv")
    for i in xrange(2,int(seriesnum)):
        eachfile = pd.read_csv(filename1 + "_" + str(i) + ".csv")
        merged = firstfile.merge(eachfile, on='Key', left_index=False, how='inner') #also tried how='left' and how='outer'
        merged.to_csv("result.csv", index=False, na_rep='NA')
        firstfile = pd.read_csv("result.csv")

6 csv'加入后,我收到以下错误(在这种情况下有16个文件)

  File "./parser_nsar.py", line 165, in <module>
    mergeCsv(hardcodename, 16)
  File "./parser_nsar.py", line 112, in mergeCsv
    merged = firstfile.merge(eachfile, on='Key', left_index=False, how='inner')
  File "/Library/Python/2.7/site-packages/pandas/core/frame.py", line 3632, in merge
    suffixes=suffixes, copy=copy)
  File "/Library/Python/2.7/site-packages/pandas/tools/merge.py", line 40, in merge
    return op.get_result()
  File "/Library/Python/2.7/site-packages/pandas/tools/merge.py", line 189, in get_result
    ldata, rdata = self._get_merge_data()
  File "/Library/Python/2.7/site-packages/pandas/tools/merge.py", line 284, in _get_merge_data
    copydata=False)
  File "/Library/Python/2.7/site-packages/pandas/core/internals.py", line 3439, in _maybe_rename_join
    to_rename = self.items.intersection(other.items)
  File "/Library/Python/2.7/site-packages/pandas/core/index.py", line 962, in intersection
    indexer = self.get_indexer(other.values)
  File "/Library/Python/2.7/site-packages/pandas/core/index.py", line 1120, in get_indexer
    raise InvalidIndexError('Reindexing only valid with uniquely'
pandas.core.index.InvalidIndexError: Reindexing only valid with uniquely valued Index objects

所以到目前为止我有这个结果(这正是我想要的,但对于所有的csvs):

006  000000,N,N,N,N,N,N
007C0100,1,2,3,4,5,6
007C0200,XTF ADVISORS TRUST - ETF 2010 PORTFOLIO,XTF ADVISORS TRUST - ETF 2015 PORTFOLIO,XTF ADVISORS TRUST - ETF 2020 PORTFOLIO,XTF ADVISORS TRUST - ETF 2025 PORTFOLIO,XTF ADVISORS TRUST - ETF 2030 PORTFOLIO,XTF ADVISORS TRUST - ETF 2040 PLUS PORTFOLIO
007C0300,Y,Y,Y,Y,Y,Y
0240000,N,N,N,N,N,N
025D0001,0,0,0,0,0,0
025D0002,0,0,0,0,0,0
025D0003,0,0,0,0,0,0
025D0004,0,0,0,0,0,0
025D0005,0,0,0,0,0,0
025D0006,0,0,0,0,0,0
025D0007,0,0,0,0,0,0
025D0008,0,0,0,0,0,0
028A0100,0,472,4,49,10,30
028A0200,0,0,0,0,0,0
028A0300,0,0,0,0,0,0
028A0400,1,3,2,10,5,5
028B0100,1,302,196,107,28,93
028B0200,0,0,0,0,0,0
028B0300,0,0,0,0,0,0
028B0400,10,16,22,80,54,10
028C0100,27,0,23,10,1,23
028C0200,0,0,0,0,0,0
028C0300,0,0,0,0,0,0
028C0400,1,27,180,171,6,60

0 个答案:

没有答案