Question

此函数查看pandas DataFrame中的字符串。如果字符串包含与字典中的条目匹配的正则表达式，则它会将捕获的字符串传递给函数的其他部分，最后返回statement。

def f(value):
    f1 = lambda x: dictionary[regex.findall(x)[0]] if regex.findall(x)[0] in dictionary else ""
    match = f1(value)
    #Do stuff
    return statement

问题：

如何让它接受部分匹配，并替换匹配的单词，同时保持字符串的其余部分完好无损？现在它只接受文字匹配。

目标：

字符串为"BULL GOOGLE X3 VON"。我希望字典中的{"GOOG":足以将单词转换为:"Google"}。转换后的字符串为"BULL Google X3 VON"，函数将传递给"Google"。

注意：我想继续使用dict来实现，因为程序的其他部分依赖于它。

代码：

#DataFrame
df = pd.DataFrame(["BULL GOOGLE X3 VON", "BEAR TWITTER 12X S"], columns=["Name"])

#Dict
google = {"GOOG":"Google"}
twitter = {"TWITT":"Twitter"}
dictionary = goog.copy()
dictionary.update(twitter)

#Regex
regex = re.compile(r"\s(\S+)\s", flags=re.IGNORECASE)

#Function
def f(value):
    f1 = lambda x: dictionary[regex.findall(x)[0]] if regex.findall(x)[0] in dictionary else ""
    match = f1(value)
    #Do stuff
    return statement

#Map Function
df["Statement"] = df["Name"].map(lambda x:f(x))

观：

如果可以直接修改功能以接受部分匹配，那就不错了。

否则，解决方案可能是首先replace字符串中的匹配单词 - 保持字符串的其余部分完整 - 然后将正则表达式子字符串与字典匹配。这些步骤可能发生在临时列中，因此列"Name"仍处于其原始状态以供将来使用。

Answer 1

我认为这可能就是你要找的东西。

df = pd.DataFrame(["BULL GOOGLE X3 VON", "BEAR TWITTER 12X S"], columns ["Name"])

#Dict
google = {"GOOG":"Google"}
twitter = {"TWITT":"Twitter"}
dictionary = google.copy()
dictionary.update(twitter)

#Regex
regex = re.compile(r"\b((%s)\S*)\b" %"|".join(dictionary.keys()), re.I)

def dictionary_lookup(match):
    return dictionary[match.group(2)]

#Function
def f(value):
    match = dictionary[regex.search(value).group(2)]
    #Do stuff
    statement = regex.sub(dictionary_lookup, value)
    return statement

#Map Function
df["Statement"] = df["Name"].map(lambda x:f(x))

这将匹配以字典中的一个键开头的任何单词，将字典中的匹配值分配给变量match，然后返回匹配的单词替换的原始字符串。

熊猫：使功能图部分Dict匹配

1 个答案: