我正在努力解决使用pandas regex
通过str.extract
来使"name"
列中的每一行生成列"description"
的函数的一些问题。我使用的是regex
而不是split
,因为代码必须能够管理各种格式。
必须修改该功能以确认各种条件。
数据帧:
import pandas as pd
import re
df = pd.DataFrame(["LONG AXP UN X3 VON", "SHORT BIDU UN 5x VON", "SHORT GOOG VON", "LONG GOOG VON"], columns=["name"])
输入:
name
"LONG AXP UN X3 VON"
"SHORT BIDU UN 5x VON"
"SHORT GOOG VON"
"LONG GOOG VON"
当前代码:
description_map = {"AXP":"American Express", "BIDU":"Baidu"}
sign_map = {"LONG": "", "SHORT": "-"}
def f(strseries):
stock = strseries.str.extract(r"\s(\S+)\s").map(description_map)
leverage = strseries.str.extract(r"(X\d+|\d+X)\s", flags=re.IGNORECASE)
sign = strseries.str.extract(r"(\S+)\s").map(sign_map)
return "Tracks " + stock + " with " + sign + leverage + " leverage"
df["description"] = f(df["name"])
当前输出:
name description
"LONG AXP UN X3 VON" "Tracks American Express with X3 leverage"
"SHORT BIDU UN 5x VON" "Tracks Baidu with -5x leverage"
"SHORT GOOG VON" ""
"LONG GOOG VON" ""
期望的输出:
name description
"LONG AXP UN X3 VON" "Tracks American Express with 3x leverage"
"SHORT BIDU UN 5x VON" "Tracks Baidu inversely with -5x leverage"
"SHORT GOOG VON" "Tracks inversely"
"LONG GOOG VON" "Tracks"
意义:
sign
是"-"
,我怎样才能将direction = "inversely"
添加到字符串中?stock
中没有name
与字典description_map
匹配:设置stock = ""
并返回字符串。leverage
中找不到name
:忽略部分"with" + sign + leverage + " leverage"
。sign + leverage
,以便始终按-5x"
的顺序显示,无论是否输入"SHORT X5"
。答案 0 :(得分:2)
我花了一些时间写这个函数:
description_map = {"AXP":"American Express", "BIDU":"Baidu"}
sign_map = {"LONG": "", "SHORT": "-"}
stock_match = re.compile(r"\s(\S+)\s")
leverage_match = re.compile("[0-9]x|x[0-9]|X[0-9]|[0-9]X")
def f(value):
f1 = lambda x: description_map[stock_match.findall(x)[0]] if stock_match.findall(x)[0] in description_map else ''
f2 = lambda x: leverage_match.findall(x)[0] if len(leverage_match.findall(x)) > 0 else ''
f3 = lambda x: '-' if 'SHORT' in x else ''
stock = f1(value)
leverage = f2(value)
sign = f3(value)
statement = "Tracks " + stock
if stock == "":
if sign == '-':
return statement + "{}".format('inversely')
else:
return "Tracks"
if leverage[0].replace('X','x') == 'x':
leverage = leverage[1]+leverage[0].replace('X','x')
if leverage != '' and sign == '-':
statement += " {} with {}{} leverage".format('inversely', sign, leverage)
elif leverage != '' and sign == '':
statement += " with {} leverage".format(leverage)
else:
if sign == '-':
statement += " {} ".format('Inversely')
return statement
df["description"] = df["name"].map(lambda x:f(x))
输出:
In [97]: %paste
import pandas as pd
import re
df = pd.DataFrame(["LONG AXP UN X3 VON", "SHORT BIDU UN 5x VON", "SHORT GOOG VON", "LONG GOOG VON"], columns=["name"])
## -- End pasted text --
In [98]: df
Out[98]:
name
0 LONG AXP UN X3 VON
1 SHORT BIDU UN 5x VON
2 SHORT GOOG VON
3 LONG GOOG VON
In [99]: df["description"] = df["name"].map(lambda x:f(x))
In [100]: df
Out[100]:
name description
0 LONG AXP UN X3 VON Tracks American Express with 3x leverage
1 SHORT BIDU UN 5x VON Tracks Baidu inversely with -5x leverage
2 SHORT GOOG VON Tracks inversely
3 LONG GOOG VON Tracks