我有一个数据集;
x1 NAN
x2 NAN
x3 NAN
NAN y1
NAN y2
有没有办法将大熊猫数据框重塑到下面; 我猜这就像sql外部联接,因此我可以将值相乘。
x1 y1
x1 y2
x2 y1
x2 y2
x3 y1
x3 y2
编辑: 原因;我必须将Excel文件(我没有控制权)转换为这种格式,以提供另一个程序(我没有控制权)
xl = pd.ExcelFile(
'/inputfile.xlsx')
ncols = xl.book.sheet_by_index(0).ncols
df = xl.parse(0, converters={i: str for i in range(ncols)})
## Maybe this kind of Logic
## But could it be Pythonic
# for index in range(len(df)):
# if not pd.isnull(df.iloc[index][3]):
# print(df.iloc[index][3])
writer = pd.ExcelWriter(
'output.xlsx') # engine='xlsxwriter'
df.to_excel(writer, 'Sheet1', index=False)
writer.save()
答案 0 :(得分:0)
您可以从以下快速技巧开始
df1 = pd.DataFrame(data=df.values.reshape(-1))
for i in df1[0].str.replace('\d+','').unique():
df1[i] = df1[0]
df1 = df1[df1[0].str.replace('\d+','').dropna().unique()]
for xx in df1.columns:
df1[xx] = df1[xx].apply(lambda x:x if type(x)==str and x.startswith(df1[xx].name) else np.nan )
df1
x c y title
0 x1 NaN NaN NaN
1 NaN c1 NaN NaN
2 x2 NaN NaN NaN
3 NaN c2 y1 NaN
4 x3 NaN y3 title1
5 NaN c3 NaN title2
6 NaN NaN NaN NaN