Python词典中键入值的关键

时间:2015-06-24 20:29:51

标签: python python-2.7 dictionary

我觉得这不是最优雅的解决方案,但我正在寻找一种方法来将值用作同一个字典中的键。原因是我可以在一个地方包含多条信息。

我当前的解决方案似乎仅适用于两个字典中的一个。我无法弄清楚原因。

Pandas DataFrame:

import pandas as pd
df = pd.DataFrame(["BEAR ESTOX X12 S", "BEAR ESTOX X15 S", "BEAR AXP UN 3X VON", "BULL AXP UN x5 VON"], columns=["name"])

两个词典:

sg = {"S":"SEG", "SG":"SEG", "SEG":"www.societegenerale.com"}
vontobel = {"VON":"Vontobel","Vontobel":"www.vontobel.com"}

issuer = sg.copy()
issuer.update(vontobel) #Combine both to one dictionary
然后我做了:

#Split last word in string to new column
df["issuer_spl"] = df.name.str.split().str.get(-1)

#Copy to column "issuer" and substitute abbreviations via dictionary
for i in issuer:
    df.loc[df.issuer_spl.str.contains(i), "issuer"] = issuer[i]

#Another pass in the dictionary, copying and substituting to column "website" 
for w in issuer:
    df.loc[df.issuer.str.contains(w).fillna(False), "website"] = issuer[w]

产生输出:

name                    issuer_spl      issuer          website
"BEAR ESTOX X12 S"      "S"             "SEG"           "SEG"
"BEAR ESTOX X15 S"      "S"             "SEG"           "SEG"
"BULL AXP UN X3 VON"    "VON"           "Vontobel"      "www.vontobel.com"         
"BEAR AXP UN X3 VON"    "VON"           "Vontobel"      "www.vontobel.com"

我做错了什么使它成为"www.vontobel.com"但不是"www.societegenerale.com"的网站的关键?

是否有另一种方法可以做到更有条理,例如一个字典,也可以作为"列表"可以通过[i]访问?

期望的输出:

name                    issuer_spl      issuer       website
"BEAR ESTOX X12 S"      "S"             "SEG"        "www.societegenerale.com"
"BEAR ESTOX X15 S"      "S"             "SEG"        "www.societegenerale.com"
"BULL AXP UN X3 VON"    "VON"           "Vontobel"   "www.vontobel.com"         
"BEAR AXP UN X3 VON"    "VON"           "Vontobel"   "www.vontobel.com"

2 个答案:

答案 0 :(得分:1)

如果添加print语句:

for w in issuer:
    print(w)
    df.loc[df.issuer.str.contains(w).fillna(False), "website"] = issuer[w]
你可能会看到:

SEG
S
SG
VON
Vontobel

表示w在绑定到'S'后绑定到'SEG' 。 由于

In [220]: df.issuer.str.contains('SEG')
Out[220]: 
0     True
1     True
2    False
3    False
Name: issuer, dtype: bool

In [221]: df.issuer.str.contains('S')
Out[221]: 
0     True
1     True
2    False
3    False
Name: issuer, dtype: bool

声明

df.loc[df.issuer.str.contains(w).fillna(False), "website"] = issuer[w]

最终设置前两行的值为issuer['S'],等于 'SEG',因为w之后'S'被绑定到'SEG'

请注意,未指定迭代dict键的顺序 用Python语言;未订购dict个密钥。在Python3中,订单可以 随着程序的每次运行而改变。所以在Python3中你的代码有时可能会 "工作",有时不工作。

相反,您可以使用Series.map

import pandas as pd
df = pd.DataFrame(["BEAR ESTOX X12 S", "BEAR ESTOX X15 S", "BEAR AXP UN 3X VON", "BULL AXP UN x5 VON"], columns=["name"])

sg = {"S":"SEG", "SG":"SEG", "SEG":"www.societegenerale.com"}
vontobel = {"VON":"Vontobel","Vontobel":"www.vontobel.com"}

issuer = sg.copy()
issuer.update(vontobel) #Combine both to one dictionary

#Split last word in string to new column
df["issuer_spl"] = df.name.str.split().str.get(-1)

df['issuer'] = df['issuer_spl'].map(issuer)
df['website'] = df['issuer'].map(issuer)

print(df)

产量

                 name issuer_spl    issuer                  website
0    BEAR ESTOX X12 S          S       SEG  www.societegenerale.com
1    BEAR ESTOX X15 S          S       SEG  www.societegenerale.com
2  BEAR AXP UN 3X VON        VON  Vontobel         www.vontobel.com
3  BULL AXP UN x5 VON        VON  Vontobel         www.vontobel.com

如果issuer_spl中的值是issuer中的键。请注意,这需要严格相等,而df.issuer_spl.str.contains(w)匹配,如果wissuer_spl中值的子字符串。

或者,如果您可以定义要区分的规则 sgvontobel dicts中的哪个值代表网站,然后您可以将dicts处理为两个单独的数据结构issuerwebsite。 例如,如果网站值始终以www开头或以.com结尾,那么 你可以用

issuer = dict()
website = dict()
for dct in [sg, vontobel]:
    for key, val in dct.items():
        if val.startswith('www') or val.endswith('.com'):
            website[key] = val
        else:
            issuer[key] = val

issuer数据与website数据分开。

In [291]: issuer
Out[291]: {'S': 'SEG', 'SG': 'SEG', 'VON': 'Vontobel'}

In [292]: website
Out[292]: {'SEG': 'www.societegenerale.com', 'Vontobel': 'www.vontobel.com'}

然后,您可以构建所需的DataFrame,而无需依赖完全匹配的匹配项:

import pandas as pd
df = pd.DataFrame(["BEAR ESTOX X12 S", "BEAR ESTOX X15 S", "BEAR AXP UN 3X VON", 
                   "BULL AXP UN x5 VON", "BEAR DAX X3 SG 2"], columns=["name"])

sg = {"S":"SEG", "SG":"SEG", "SEG":"www.societegenerale.com"}
vontobel = {"VON":"Vontobel","Vontobel":"www.vontobel.com"}

issuer = dict()
website = dict()
for dct in [sg, vontobel]:
    for key, val in dct.items():
        if val.startswith('www') or val.endswith('.com'):
            website[key] = val
        else:
            issuer[key] = val

df["issuer_spl"] = df.name.str.extract(r'(\S+\s+\S+)$')

for i in issuer:
    df.loc[df.issuer_spl.str.contains(i), "issuer"] = issuer[i]

df['website'] = df['issuer'].map(website)
print(df)

产量

                 name issuer_spl    issuer                  website
0    BEAR ESTOX X12 S      X12 S       SEG  www.societegenerale.com
1    BEAR ESTOX X15 S      X15 S       SEG  www.societegenerale.com
2  BEAR AXP UN 3X VON     3X VON  Vontobel         www.vontobel.com
3  BULL AXP UN x5 VON     x5 VON  Vontobel         www.vontobel.com
4    BEAR DAX X3 SG 2       SG 2       SEG  www.societegenerale.com

答案 1 :(得分:0)

我相信你也可以通过这种方式解决它:

使用df.issuer.str.endswith而不是使用df.issuer.str.contains。

for w in issuer:
     df.loc[df.issuer.str.endswith(w).fillna(False), "website"] = issuer[w]