我有一个几千行的熊猫数据框,它的子集在下面
fr var
1.1 10px
2.9 12pz
预期输出:
fr var vard varv
1.1 10px -5 xval
1.1 10px 5 zval
2.9 12pz -6 zval
2.9 12pz 6 xval
对于行 - 每行将被分成两部分
新列的条件:
我已经阅读了几乎类似问题的各种答案,并尝试了许多选项,如“iterrows”、“shift”、“explode”等,但无法获得预期的输出。
答案 0 :(得分:1)
数字和非数字部分先用Series.str.extract
,将数字部分转换为整数除以2
,然后在concat
中用-1
连接多个值,排序索引并创建默认值,最后使用 numpy.where
按条件设置新值:
df[['vard','varv']] = df['var'].str.extract('(\d+)(\D+)')
df['vard'] = df['vard'].astype(int).div(2)
df = pd.concat([df, df.assign(vard = df['vard']*-1)]).sort_index().reset_index(drop=True)
m = (df['varv'].eq('px') & df['vard'].lt(0)) | df['varv'].eq('pz') & df['vard'].gt(0)
df['varv'] = np.where(m, 'zval','xval')
print (df)
fr var vard varv
0 1.1 10px 5.0 xval
1 1.1 10px -5.0 zval
2 2.9 12pz 6.0 zval
3 2.9 12pz -6.0 xval
答案 1 :(得分:0)
使用melt函数可以很容易地做到这一点。
# recreate your dataframe
df = pd.DataFrame(columns=['fr','var'])
df['fr']=[1.1,2.9]
df['var']=['10px','12pz']
# split the var into its two components by creating two new columns
df['vard_p'] = df['var'].str[:-2]
df['vard_p'] = df['vard_p'].astype(float)/2
df['vard_n'] = -df['vard_p']
# get the vard from the var (I assumed it was simply the last character in the string)
df['varv'] = df['var'].str[-1]+'val'
# and here you melt on the two new vard columns to get the dataframe in the format you wanted
df = pd.melt(df, id_vars=['fr','var','varv'], value_vars=['vard_p','vard_n'])
# now rename or drop the new columns
df.rename(columns={'value':'vard'},inplace=True)
df.drop('variable',axis=1,inplace=True)
df
输出:
fr var varv vard
0 1.1 10px xval 5.0
1 2.9 12pz zval 6.0
2 1.1 10px xval -5.0
3 2.9 12pz zval -6.0
希望有帮助