我使用的CSV格式无法更改。它包含一个多索引。原始文件如下所示:
我使用以下代码执行多索引,然后堆叠然后重置索引。有用。
import pandas as pd
myfile = 'c:/temp/myfile.csv'
df = pd.read_csv(myfile, header=[0, 1], tupleize_cols=True)
df.columns = [c for _, c in df.columns[:3]] + [c for c in df.columns[3:]]
df = df.set_index(list(df.columns[:3]), append = True)
df.columns = pd.MultiIndex.from_tuples(df.columns, names = ['hour', 'field'])
df.stack(level=['hour'])
df2 = df.reset_index().copy()
df2
有时候" Zone"但是,字段留空了。
通过相同的代码放置文件会给我这个错误:
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-15-8e51ff24c0c4> in <module>()
6 df.columns = pd.MultiIndex.from_tuples(df.columns, names = ['hour', 'field'])
7 df.stack(level=['hour'])
----> 8 df2 = df.reset_index().copy()
9 df2
C:\Anaconda3\lib\site-packages\pandas\core\frame.py in reset_index(self, level, drop, inplace, col_level, col_fill)
2832
2833 # to ndarray and maybe infer different dtype
-> 2834 level_values = _maybe_casted_values(lev, lab)
2835 if level is None or i in level:
2836 new_obj.insert(0, col_name, level_values)
C:\Anaconda3\lib\site-packages\pandas\core\frame.py in _maybe_casted_values(index, labels)
2796 if labels is not None:
2797 mask = labels == -1
-> 2798 values = values.take(labels)
2799 if mask.any():
2800 values, changed = com._maybe_upcast_putmask(values,
IndexError: cannot do a non-empty take from an empty axes.
理想情况下,我希望将NaN保留在重置后的df中。
答案 0 :(得分:0)
我遇到了同样的问题。这是我的黑客:
# Loop through the index columns
for clmNm in df_w_idx.index.names:
print(clmNm)
# Make a new column in the dataframe
df_w_idx[clmNm] = df_w_idx.index.get_level_values(clmNm)
# Now you can reset the index
df_w_idx = df_w_idx.reset_index(drop=True).copy()
df_w_idx
以下是完全可重现的代码。我相信有更好的方法
import pandas as pd
import numpy as np
import random
import string
# Create 12 random strings 3 char long
rndm_strgs = [''.join(random.SystemRandom().choice(string.ascii_uppercase + string.digits) for _ in range(3)) for i in range(12)]
rndm_strgs[0] = None
rndm_strgs[5] = None
# Make Dataframe
df = pd.DataFrame({'A' : list('pandasisgood'),
'B' : np.nan,
'C' : rndm_strgs,
'D' : np.random.rand(12)})
# Set an Index -> Columns have Nans
df_w_idx = df.set_index(['A','B','C'])
for clmNm in df_w_idx.index.names:
print(clmNm)
df_w_idx[clmNm] = df_w_idx.index.get_level_values(clmNm)
df_w_idx = df_w_idx.reset_index(drop=True).copy()
df_w_idx
另见问题6322 in git。它看起来很封闭