Question

使用Python，我很难将208个CSV文件合并到一个数据帧中。（我的文件名是Customer_1.csv，Customer_2.csv ,,,和Customer_208.csv）

以下是我的代码，

%matplotlib inline
import pandas as pd
df_merged = pd.concat([pd.read_csv('data_TVV1/Customer_{0}.csv'.format(i), names = ['Time', 'Energy_{0}'.format(i)], parse_dates=['Time'], index_col=['Time'], skiprows=1) for i in range(1, 209)], axis=1)

我说错了，

    InvalidIndexError                         Traceback (most recent call last)
<ipython-input-4-a4d19b3c2a3e> in <module>()
----> 1 df_merged = pd.concat([pd.read_csv('data_TVV1/Customer_{0}.csv'.format(i), names = ['Time', 'Energy_{0}'.format(i)], parse_dates=['Time'], index_col=['Time'], skiprows=1) for i in range(1, 209)], axis=1)

/Users/Suzuki/Envs/DataVizProj/lib/python2.7/site-packages/pandas/tools/merge.pyc in concat(objs, axis, join, join_axes, ignore_index, keys, levels, names, verify_integrity, copy)
    752                        keys=keys, levels=levels, names=names,
    753                        verify_integrity=verify_integrity,
--> 754                        copy=copy)
    755     return op.get_result()
    756 

/Users/Suzuki/Envs/DataVizProj/lib/python2.7/site-packages/pandas/tools/merge.pyc in __init__(self, objs, axis, join, join_axes, keys, levels, names, ignore_index, verify_integrity, copy)
    884         self.copy = copy
    885 
--> 886         self.new_axes = self._get_new_axes()
    887 
    888     def get_result(self):

/Users/Suzuki/Envs/DataVizProj/lib/python2.7/site-packages/pandas/tools/merge.pyc in _get_new_axes(self)
    944                 if i == self.axis:
    945                     continue
--> 946                 new_axes[i] = self._get_comb_axis(i)
    947         else:
    948             if len(self.join_axes) != ndim - 1:

/Users/Suzuki/Envs/DataVizProj/lib/python2.7/site-packages/pandas/tools/merge.pyc in _get_comb_axis(self, i)
    970                 raise TypeError("Cannot concatenate list of %s" % types)
    971 
--> 972         return _get_combined_index(all_indexes, intersect=self.intersect)
    973 
    974     def _get_concat_axis(self):

/Users/Suzuki/Envs/DataVizProj/lib/python2.7/site-packages/pandas/core/index.pyc in _get_combined_index(indexes, intersect)
   5730             index = index.intersection(other)
   5731         return index
-> 5732     union = _union_indexes(indexes)
   5733     return _ensure_index(union)
   5734 

/Users/Suzuki/Envs/DataVizProj/lib/python2.7/site-packages/pandas/core/index.pyc in _union_indexes(indexes)
   5759 
   5760         if hasattr(result, 'union_many'):
-> 5761             return result.union_many(indexes[1:])
   5762         else:
   5763             for other in indexes[1:]:

/Users/Suzuki/Envs/DataVizProj/lib/python2.7/site-packages/pandas/tseries/index.pyc in union_many(self, others)
    847             else:
    848                 tz = this.tz
--> 849                 this = Index.union(this, other)
    850                 if isinstance(this, DatetimeIndex):
    851                     this.tz = tz

/Users/Suzuki/Envs/DataVizProj/lib/python2.7/site-packages/pandas/core/index.pyc in union(self, other)
   1400                 result.extend([x for x in other.values if x not in value_set])
   1401         else:
-> 1402             indexer = self.get_indexer(other)
   1403             indexer, = (indexer == -1).nonzero()
   1404 

/Users/Suzuki/Envs/DataVizProj/lib/python2.7/site-packages/pandas/core/index.pyc in get_indexer(self, target, method, limit)
   1685 
   1686         if not self.is_unique:
-> 1687             raise InvalidIndexError('Reindexing only valid with uniquely'
   1688                                     ' valued Index objects')
   1689 

InvalidIndexError: Reindexing only valid with uniquely valued Index objects

你有任何想法解决这个问题吗??? ..

Answer 1

您的代码适用于我用于测试的五个文件的小样本（每个文件包含两列和三行）。仅用于调试，尝试在for循环中编写它。首先，在循环之前，将所有文件读入列表。然后再次循环并使用try/except块附加每个块以捕获错误。最后，打印问题文件并进行调查。

# First, read all the files into a list.
files_in = [pd.read_csv('data_TVV1/Customer_{0}.csv'.format(i), 
                        names = ['Time', 'Energy_{0}'.format(i)], 
                        parse_dates=['Time'], 
                        index_col=['Time'], 
                        skiprows=1) 
            for i in range(1, 209)]

df = pd.DataFrame()
errors = []

# Try to append each file to the dataframe.
for i i range(1, 209):
    try:
        df = pd.concat([df, files_in[i - 1]], axis=1)
    except:
        errors.append(i)

# Print files containing errors.
for error in errors:
    print(files_in[error])

InvalidIndexError：重新索引仅对具有唯一值的Index对象有效

1 个答案: