转换熊猫数据框的Pythonic方法

时间:2018-12-20 14:07:27

标签: python pandas dataframe

我有一个数据集;

x1      NAN         
x2      NAN         
x3      NAN     
NAN     y1  
NAN     y2  

有没有办法将大熊猫数据框重塑到下面; 我猜这就像sql外部联接,因此我可以将值相乘。

x1  y1      
x1  y2      
x2  y1      
x2  y2          
x3  y1      
x3  y2      

编辑: 原因;我必须将Excel文件(我没有控制权)转换为这种格式,以提供另一个程序(我没有控制权)

    xl = pd.ExcelFile(
    '/inputfile.xlsx')
ncols = xl.book.sheet_by_index(0).ncols
df = xl.parse(0, converters={i: str for i in range(ncols)})

## Maybe this kind of Logic 
## But could it be Pythonic
# for index in range(len(df)):
#     if not pd.isnull(df.iloc[index][3]):
#         print(df.iloc[index][3])


writer = pd.ExcelWriter(
    'output.xlsx')  # engine='xlsxwriter'
df.to_excel(writer, 'Sheet1', index=False)
writer.save()

1 个答案:

答案 0 :(得分:0)

您可以从以下快速技巧开始

df1 = pd.DataFrame(data=df.values.reshape(-1))

for i in df1[0].str.replace('\d+','').unique():
    df1[i] = df1[0]
df1 = df1[df1[0].str.replace('\d+','').dropna().unique()]
for xx in df1.columns:
    df1[xx] = df1[xx].apply(lambda x:x if type(x)==str and x.startswith(df1[xx].name) else np.nan )

df1



        x   c   y   title
     0  x1  NaN NaN NaN
     1  NaN c1  NaN NaN
     2  x2  NaN NaN NaN
     3  NaN c2  y1  NaN
     4  x3  NaN y3  title1
     5  NaN c3  NaN title2
     6  NaN NaN NaN NaN