Question

我觉得这不是最优雅的解决方案，但我正在寻找一种方法来将值用作同一个字典中的键。原因是我可以在一个地方包含多条信息。

我当前的解决方案似乎仅适用于两个字典中的一个。我无法弄清楚原因。

Pandas DataFrame：

import pandas as pd
df = pd.DataFrame(["BEAR ESTOX X12 S", "BEAR ESTOX X15 S", "BEAR AXP UN 3X VON", "BULL AXP UN x5 VON"], columns=["name"])

两个词典：

sg = {"S":"SEG", "SG":"SEG", "SEG":"www.societegenerale.com"}
vontobel = {"VON":"Vontobel","Vontobel":"www.vontobel.com"}

issuer = sg.copy()
issuer.update(vontobel) #Combine both to one dictionary

然后我做了：

#Split last word in string to new column
df["issuer_spl"] = df.name.str.split().str.get(-1)

#Copy to column "issuer" and substitute abbreviations via dictionary
for i in issuer:
    df.loc[df.issuer_spl.str.contains(i), "issuer"] = issuer[i]

#Another pass in the dictionary, copying and substituting to column "website" 
for w in issuer:
    df.loc[df.issuer.str.contains(w).fillna(False), "website"] = issuer[w]

产生输出：

name                    issuer_spl      issuer          website
"BEAR ESTOX X12 S"      "S"             "SEG"           "SEG"
"BEAR ESTOX X15 S"      "S"             "SEG"           "SEG"
"BULL AXP UN X3 VON"    "VON"           "Vontobel"      "www.vontobel.com"         
"BEAR AXP UN X3 VON"    "VON"           "Vontobel"      "www.vontobel.com"

我做错了什么使它成为"www.vontobel.com"但不是"www.societegenerale.com"的网站的关键？

是否有另一种方法可以做到更有条理，例如一个字典，也可以作为＆＃34;列表＆＃34;可以通过[i]访问？

期望的输出：

name                    issuer_spl      issuer       website
"BEAR ESTOX X12 S"      "S"             "SEG"        "www.societegenerale.com"
"BEAR ESTOX X15 S"      "S"             "SEG"        "www.societegenerale.com"
"BULL AXP UN X3 VON"    "VON"           "Vontobel"   "www.vontobel.com"         
"BEAR AXP UN X3 VON"    "VON"           "Vontobel"   "www.vontobel.com"

Answer 1

如果添加print语句：

for w in issuer:
    print(w)
    df.loc[df.issuer.str.contains(w).fillna(False), "website"] = issuer[w]

你可能会看到：

SEG
S
SG
VON
Vontobel

表示w在绑定到'S'后绑定到'SEG' 。由于

In [220]: df.issuer.str.contains('SEG') Out[220]: 0 True 1 True 2 False 3 False Name: issuer, dtype: bool In [221]: df.issuer.str.contains('S') Out[221]: 0 True 1 True 2 False 3 False Name: issuer, dtype: bool

声明

df.loc[df.issuer.str.contains(w).fillna(False), "website"] = issuer[w]

最终设置前两行的值为issuer['S']，等于 'SEG'，因为w之后'S'被绑定到'SEG'。

请注意，未指定迭代dict键的顺序用Python语言;未订购dict个密钥。在Python3中，订单可以随着程序的每次运行而改变。所以在Python3中你的代码有时可能会＆＃34;工作＆＃34;，有时不工作。

相反，您可以使用Series.map：

import pandas as pd df = pd.DataFrame(["BEAR ESTOX X12 S", "BEAR ESTOX X15 S", "BEAR AXP UN 3X VON", "BULL AXP UN x5 VON"], columns=["name"]) sg = {"S":"SEG", "SG":"SEG", "SEG":"www.societegenerale.com"} vontobel = {"VON":"Vontobel","Vontobel":"www.vontobel.com"} issuer = sg.copy() issuer.update(vontobel) #Combine both to one dictionary #Split last word in string to new column df["issuer_spl"] = df.name.str.split().str.get(-1) df['issuer'] = df['issuer_spl'].map(issuer) df['website'] = df['issuer'].map(issuer) print(df)

产量

name issuer_spl issuer website 0 BEAR ESTOX X12 S S SEG www.societegenerale.com 1 BEAR ESTOX X15 S S SEG www.societegenerale.com 2 BEAR AXP UN 3X VON VON Vontobel www.vontobel.com 3 BULL AXP UN x5 VON VON Vontobel www.vontobel.com

如果issuer_spl中的值是issuer中的键。请注意，这需要严格相等，而df.issuer_spl.str.contains(w)匹配，如果w是issuer_spl中值的子字符串。

或者，如果您可以定义要区分的规则 sg，vontobel dicts中的哪个值代表网站，然后您可以将dicts处理为两个单独的数据结构issuer和website。例如，如果网站值始终以www开头或以.com结尾，那么你可以用

issuer = dict() website = dict() for dct in [sg, vontobel]: for key, val in dct.items(): if val.startswith('www') or val.endswith('.com'): website[key] = val else: issuer[key] = val

将issuer数据与website数据分开。

In [291]: issuer Out[291]: {'S': 'SEG', 'SG': 'SEG', 'VON': 'Vontobel'} In [292]: website Out[292]: {'SEG': 'www.societegenerale.com', 'Vontobel': 'www.vontobel.com'}

然后，您可以构建所需的DataFrame，而无需依赖完全匹配的匹配项：

import pandas as pd df = pd.DataFrame(["BEAR ESTOX X12 S", "BEAR ESTOX X15 S", "BEAR AXP UN 3X VON", "BULL AXP UN x5 VON", "BEAR DAX X3 SG 2"], columns=["name"]) sg = {"S":"SEG", "SG":"SEG", "SEG":"www.societegenerale.com"} vontobel = {"VON":"Vontobel","Vontobel":"www.vontobel.com"} issuer = dict() website = dict() for dct in [sg, vontobel]: for key, val in dct.items(): if val.startswith('www') or val.endswith('.com'): website[key] = val else: issuer[key] = val df["issuer_spl"] = df.name.str.extract(r'(\S+\s+\S+)$') for i in issuer: df.loc[df.issuer_spl.str.contains(i), "issuer"] = issuer[i] df['website'] = df['issuer'].map(website) print(df)

产量

name issuer_spl issuer website 0 BEAR ESTOX X12 S X12 S SEG www.societegenerale.com 1 BEAR ESTOX X15 S X15 S SEG www.societegenerale.com 2 BEAR AXP UN 3X VON 3X VON Vontobel www.vontobel.com 3 BULL AXP UN x5 VON x5 VON Vontobel www.vontobel.com 4 BEAR DAX X3 SG 2 SG 2 SEG www.societegenerale.com

Answer 2

我相信你也可以通过这种方式解决它：

使用df.issuer.str.endswith而不是使用df.issuer.str.contains。

for w in issuer:
     df.loc[df.issuer.str.endswith(w).fillna(False), "website"] = issuer[w]

Python词典中键入值的关键

2 个答案: