目标:根据提供给我的内容重新格式化pandas数据框的内容。
我希望使用以下样式更改每列:
我使用以下代码生成我需要的样式,但效率不高:
lt = []
for i in patterns['Components'][0]:
for x in i.split('__'):
lt.append(x)
lt[1].replace('(','').replace(', ',' < '+str(lt[0])+' ≤ ').replace(']','')
我试图Pandas Replace无济于事 - 它没有抛出任何错误,似乎忽略了我的目标。
答案 0 :(得分:1)
来源DF:
In [37]: df
Out[37]:
Components Outcome
0 (Quantity__(0.0, 16199.0]) (UnitPrice__(-1055.648, 3947.558])
1 (UnitPrice__(-1055.648, 3947.558]) (Quantity__(0.0, 16199.0])
<强>解决方案:强>
In [38]: cols = ['Components','Outcome']
...: df[cols] = df[cols].replace(r'\(([^_]*)__\(([^,\s]+),\s*([^\]]+)\]\).*',
...: r'\2 < \1 <= \3',
...: regex=True)
<强>结果:强>
In [39]: df
Out[39]:
Components Outcome
0 0.0 < Quantity <= 16199.0 -1055.648 < UnitPrice <= 3947.558
1 -1055.648 < UnitPrice <= 3947.558 0.0 < Quantity <= 16199.0
<强>更新强>
In [113]: df
Out[113]:
Components Outcome
0 (Quantity__(0.0, 16199.0]) (UnitPrice__(-1055.648, 3947.558])
1 (UnitPrice__(-1055.648, 3947.558]) (Quantity__(0.0, 16199.0])
In [114]: cols = ['Components','Outcome']
In [115]: pat = r'\s*\(([^_]*)__\(([^,\s]+),\s*([^\]]+)\]\)\s*'
In [116]: df[cols] = df[cols].replace(pat, r'\2 < \1 <= \3', regex=True)
In [117]: df
Out[117]:
Components Outcome
0 0.0 < Quantity <= 16199.0 -1055.648 < UnitPrice <= 3947.558
1 -1055.648 < UnitPrice <= 3947.558 0.0 < Quantity <= 16199.0
或括号括号:
In [119]: df
Out[119]:
Components Outcome
0 Quantity__(0.0, 16199.0]) UnitPrice__(-1055.648, 3947.558]
1 UnitPrice__(-1055.648, 3947.558] Quantity__(0.0, 16199.0]
In [120]: pat = r'([^_]*)__\(([^,\s]+),\s*([^\]]+)\]'
In [121]: df[cols] = df[cols].replace(pat, r'\2 < \1 <= \3', regex=True)
In [122]: df
Out[122]:
Components Outcome
0 0.0 < Quantity <= 16199.0) -1055.648 < UnitPrice <= 3947.558
1 -1055.648 < UnitPrice <= 3947.558 0.0 < Quantity <= 16199.0
答案 1 :(得分:0)
import pandas as pd
import re
data=pd.DataFrame({'components':
['(quantity__(0.0,16199.0])','(unitprice__(-1055.648,8494.557])'],'outcome':
['(unitprice__(-1055.648,8494.557])','quantity__(0.0,16199.0])']})
def func(x):
x=str(x)
x=x.split('__')
dx=x[0].replace("(",'')
mt=re.findall('\d*\.\d*',x[1])
return('{}<{}<={}'.format(dx,mt[0],mt[1]))
df=data.applymap(func)
print(df)