索引所有空值

时间:2016-03-15 18:55:38

标签: python pandas indexing multi-index

我使用的CSV格式无法更改。它包含一个多索引。原始文件如下所示:

enter image description here

我使用以下代码执行多索引,然后堆叠然后重置索引。有用。

import pandas as pd
myfile = 'c:/temp/myfile.csv'
df = pd.read_csv(myfile, header=[0, 1], tupleize_cols=True)
df.columns = [c for _, c in df.columns[:3]] + [c for c in df.columns[3:]]
df = df.set_index(list(df.columns[:3]), append = True)
df.columns = pd.MultiIndex.from_tuples(df.columns, names = ['hour', 'field'])
df.stack(level=['hour'])
df2 = df.reset_index().copy()
df2

enter image description here

有时候" Zone"但是,字段留空了。

enter image description here

通过相同的代码放置文件会给我这个错误:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-15-8e51ff24c0c4> in <module>()
      6 df.columns = pd.MultiIndex.from_tuples(df.columns, names = ['hour', 'field'])
      7 df.stack(level=['hour'])
----> 8 df2 = df.reset_index().copy()
      9 df2

C:\Anaconda3\lib\site-packages\pandas\core\frame.py in reset_index(self, level, drop, inplace, col_level, col_fill)
   2832 
   2833                     # to ndarray and maybe infer different dtype
-> 2834                     level_values = _maybe_casted_values(lev, lab)
   2835                     if level is None or i in level:
   2836                         new_obj.insert(0, col_name, level_values)

C:\Anaconda3\lib\site-packages\pandas\core\frame.py in _maybe_casted_values(index, labels)
   2796             if labels is not None:
   2797                 mask = labels == -1
-> 2798                 values = values.take(labels)
   2799                 if mask.any():
   2800                     values, changed = com._maybe_upcast_putmask(values,

IndexError: cannot do a non-empty take from an empty axes.

理想情况下,我希望将NaN保留在重置后的df中。

1 个答案:

答案 0 :(得分:0)

我遇到了同样的问题。这是我的黑客:

# Loop through the index columns
for clmNm in df_w_idx.index.names:
    print(clmNm)

    # Make a new column in the dataframe
    df_w_idx[clmNm] = df_w_idx.index.get_level_values(clmNm)

# Now you can reset the index     
df_w_idx = df_w_idx.reset_index(drop=True).copy()
df_w_idx

以下是完全可重现的代码。我相信有更好的方法

import pandas as pd
import numpy as np
import random
import string



# Create 12 random strings 3 char long 
rndm_strgs = [''.join(random.SystemRandom().choice(string.ascii_uppercase + string.digits) for _ in range(3)) for i in range(12)]            
rndm_strgs[0] = None
rndm_strgs[5] = None

# Make Dataframe
df = pd.DataFrame({'A' : list('pandasisgood'),
                   'B' : np.nan,
                   'C' : rndm_strgs,
                   'D' : np.random.rand(12)})

# Set an Index -> Columns have Nans
df_w_idx = df.set_index(['A','B','C'])


for clmNm in df_w_idx.index.names:
    print(clmNm)

    df_w_idx[clmNm] = df_w_idx.index.get_level_values(clmNm)


df_w_idx = df_w_idx.reset_index(drop=True).copy()
df_w_idx

另见问题6322 in git。它看起来很封闭