熊猫:有条件地从列内容生成描述

时间:2015-06-25 15:41:32

标签: python pandas

我正在努力解决使用pandas regex通过str.extract来使"name"列中的每一行生成列"description"的函数的一些问题。我使用的是regex而不是split,因为代码必须能够管理各种格式。

必须修改该功能以确认各种条件。

数据帧:

import pandas as pd
import re

df = pd.DataFrame(["LONG AXP UN X3 VON", "SHORT BIDU UN 5x VON", "SHORT GOOG VON", "LONG GOOG VON"], columns=["name"])

输入:

name
"LONG AXP UN X3 VON"
"SHORT BIDU UN 5x VON"
"SHORT GOOG VON"
"LONG GOOG VON"

当前代码:

description_map = {"AXP":"American Express", "BIDU":"Baidu"}
sign_map = {"LONG": "", "SHORT": "-"}
def f(strseries):
    stock = strseries.str.extract(r"\s(\S+)\s").map(description_map)
    leverage = strseries.str.extract(r"(X\d+|\d+X)\s", flags=re.IGNORECASE)
    sign = strseries.str.extract(r"(\S+)\s").map(sign_map)
    return "Tracks " + stock + " with " + sign + leverage + " leverage"

df["description"] = f(df["name"])

当前输出:

name                        description
"LONG AXP UN X3 VON"        "Tracks American Express with X3 leverage"
"SHORT BIDU UN 5x VON"      "Tracks Baidu with -5x leverage"
"SHORT GOOG VON"            ""
"LONG GOOG VON"             ""

期望的输出:

name                        description
"LONG AXP UN X3 VON"        "Tracks American Express with 3x leverage"
"SHORT BIDU UN 5x VON"      "Tracks Baidu inversely with -5x leverage"
"SHORT GOOG VON"            "Tracks inversely"
"LONG GOOG VON"             "Tracks"

意义:

  • 如果sign"-",我怎样才能将direction = "inversely"添加到字符串中?
  • 如果stock中没有name与字典description_map匹配:设置stock = ""并返回字符串。
  • 如果在leverage中找不到name:忽略部分"with" + sign + leverage + " leverage"
  • 拆分并重新排序sign + leverage,以便始终按-5x"的顺序显示,无论是否输入"SHORT X5"

1 个答案:

答案 0 :(得分:2)

我花了一些时间写这个函数:

description_map = {"AXP":"American Express", "BIDU":"Baidu"}
sign_map = {"LONG": "", "SHORT": "-"}

stock_match = re.compile(r"\s(\S+)\s")
leverage_match = re.compile("[0-9]x|x[0-9]|X[0-9]|[0-9]X")

def f(value):

    f1 = lambda x: description_map[stock_match.findall(x)[0]] if stock_match.findall(x)[0] in description_map else ''
    f2 = lambda x: leverage_match.findall(x)[0] if len(leverage_match.findall(x)) > 0 else ''
    f3 = lambda x: '-' if 'SHORT' in x else ''

    stock = f1(value)
    leverage = f2(value)
    sign = f3(value)

    statement = "Tracks " + stock

    if stock == "":
        if sign == '-':
            return statement + "{}".format('inversely')
        else:
            return "Tracks"

    if leverage[0].replace('X','x') == 'x':
        leverage = leverage[1]+leverage[0].replace('X','x')

    if leverage != '' and sign == '-':
        statement += " {} with {}{} leverage".format('inversely', sign, leverage)
    elif leverage != '' and sign == '':
        statement += " with {} leverage".format(leverage)
    else:
        if sign == '-':
            statement += " {} ".format('Inversely')

    return statement

df["description"] = df["name"].map(lambda x:f(x))

输出:

In [97]: %paste
import pandas as pd
import re

df = pd.DataFrame(["LONG AXP UN X3 VON", "SHORT BIDU UN 5x VON", "SHORT GOOG VON", "LONG GOOG VON"], columns=["name"])

## -- End pasted text --

In [98]: df
Out[98]: 
                   name
0    LONG AXP UN X3 VON
1  SHORT BIDU UN 5x VON
2        SHORT GOOG VON
3         LONG GOOG VON

In [99]: df["description"] = df["name"].map(lambda x:f(x))

In [100]: df
Out[100]: 
                   name                               description
0    LONG AXP UN X3 VON  Tracks American Express with 3x leverage
1  SHORT BIDU UN 5x VON  Tracks Baidu inversely with -5x leverage
2        SHORT GOOG VON                          Tracks inversely
3         LONG GOOG VON                                    Tracks