我觉得这不是最优雅的解决方案,但我正在寻找一种方法来将值用作同一个字典中的键。原因是我可以在一个地方包含多条信息。
我当前的解决方案似乎仅适用于两个字典中的一个。我无法弄清楚原因。
Pandas DataFrame:
import pandas as pd
df = pd.DataFrame(["BEAR ESTOX X12 S", "BEAR ESTOX X15 S", "BEAR AXP UN 3X VON", "BULL AXP UN x5 VON"], columns=["name"])
两个词典:
sg = {"S":"SEG", "SG":"SEG", "SEG":"www.societegenerale.com"}
vontobel = {"VON":"Vontobel","Vontobel":"www.vontobel.com"}
issuer = sg.copy()
issuer.update(vontobel) #Combine both to one dictionary
然后我做了:
#Split last word in string to new column
df["issuer_spl"] = df.name.str.split().str.get(-1)
#Copy to column "issuer" and substitute abbreviations via dictionary
for i in issuer:
df.loc[df.issuer_spl.str.contains(i), "issuer"] = issuer[i]
#Another pass in the dictionary, copying and substituting to column "website"
for w in issuer:
df.loc[df.issuer.str.contains(w).fillna(False), "website"] = issuer[w]
产生输出:
name issuer_spl issuer website
"BEAR ESTOX X12 S" "S" "SEG" "SEG"
"BEAR ESTOX X15 S" "S" "SEG" "SEG"
"BULL AXP UN X3 VON" "VON" "Vontobel" "www.vontobel.com"
"BEAR AXP UN X3 VON" "VON" "Vontobel" "www.vontobel.com"
我做错了什么使它成为"www.vontobel.com"
但不是"www.societegenerale.com"
的网站的关键?
是否有另一种方法可以做到更有条理,例如一个字典,也可以作为"列表"可以通过[i]
访问?
期望的输出:
name issuer_spl issuer website
"BEAR ESTOX X12 S" "S" "SEG" "www.societegenerale.com"
"BEAR ESTOX X15 S" "S" "SEG" "www.societegenerale.com"
"BULL AXP UN X3 VON" "VON" "Vontobel" "www.vontobel.com"
"BEAR AXP UN X3 VON" "VON" "Vontobel" "www.vontobel.com"
答案 0 :(得分:1)
如果添加print语句:
for w in issuer:
print(w)
df.loc[df.issuer.str.contains(w).fillna(False), "website"] = issuer[w]
你可能会看到:
SEG
S
SG
VON
Vontobel
表示w
在绑定到'S'
后绑定到'SEG'
。
由于
In [220]: df.issuer.str.contains('SEG')
Out[220]:
0 True
1 True
2 False
3 False
Name: issuer, dtype: bool
In [221]: df.issuer.str.contains('S')
Out[221]:
0 True
1 True
2 False
3 False
Name: issuer, dtype: bool
声明
df.loc[df.issuer.str.contains(w).fillna(False), "website"] = issuer[w]
最终设置前两行的值为issuer['S']
,等于
'SEG'
,因为w
之后'S'
被绑定到'SEG'
。
请注意,未指定迭代dict键的顺序
用Python语言;未订购dict
个密钥。在Python3中,订单可以
随着程序的每次运行而改变。所以在Python3中你的代码有时可能会
"工作",有时不工作。
相反,您可以使用Series.map
:
import pandas as pd
df = pd.DataFrame(["BEAR ESTOX X12 S", "BEAR ESTOX X15 S", "BEAR AXP UN 3X VON", "BULL AXP UN x5 VON"], columns=["name"])
sg = {"S":"SEG", "SG":"SEG", "SEG":"www.societegenerale.com"}
vontobel = {"VON":"Vontobel","Vontobel":"www.vontobel.com"}
issuer = sg.copy()
issuer.update(vontobel) #Combine both to one dictionary
#Split last word in string to new column
df["issuer_spl"] = df.name.str.split().str.get(-1)
df['issuer'] = df['issuer_spl'].map(issuer)
df['website'] = df['issuer'].map(issuer)
print(df)
产量
name issuer_spl issuer website
0 BEAR ESTOX X12 S S SEG www.societegenerale.com
1 BEAR ESTOX X15 S S SEG www.societegenerale.com
2 BEAR AXP UN 3X VON VON Vontobel www.vontobel.com
3 BULL AXP UN x5 VON VON Vontobel www.vontobel.com
如果issuer_spl
中的值是issuer
中的键。请注意,这需要严格相等,而df.issuer_spl.str.contains(w)
匹配,如果w
是issuer_spl
中值的子字符串。
或者,如果您可以定义要区分的规则
sg
,vontobel
dicts中的哪个值代表网站,然后您可以将dicts处理为两个单独的数据结构issuer
和website
。
例如,如果网站值始终以www
开头或以.com
结尾,那么
你可以用
issuer = dict()
website = dict()
for dct in [sg, vontobel]:
for key, val in dct.items():
if val.startswith('www') or val.endswith('.com'):
website[key] = val
else:
issuer[key] = val
将issuer
数据与website
数据分开。
In [291]: issuer
Out[291]: {'S': 'SEG', 'SG': 'SEG', 'VON': 'Vontobel'}
In [292]: website
Out[292]: {'SEG': 'www.societegenerale.com', 'Vontobel': 'www.vontobel.com'}
然后,您可以构建所需的DataFrame,而无需依赖完全匹配的匹配项:
import pandas as pd
df = pd.DataFrame(["BEAR ESTOX X12 S", "BEAR ESTOX X15 S", "BEAR AXP UN 3X VON",
"BULL AXP UN x5 VON", "BEAR DAX X3 SG 2"], columns=["name"])
sg = {"S":"SEG", "SG":"SEG", "SEG":"www.societegenerale.com"}
vontobel = {"VON":"Vontobel","Vontobel":"www.vontobel.com"}
issuer = dict()
website = dict()
for dct in [sg, vontobel]:
for key, val in dct.items():
if val.startswith('www') or val.endswith('.com'):
website[key] = val
else:
issuer[key] = val
df["issuer_spl"] = df.name.str.extract(r'(\S+\s+\S+)$')
for i in issuer:
df.loc[df.issuer_spl.str.contains(i), "issuer"] = issuer[i]
df['website'] = df['issuer'].map(website)
print(df)
产量
name issuer_spl issuer website
0 BEAR ESTOX X12 S X12 S SEG www.societegenerale.com
1 BEAR ESTOX X15 S X15 S SEG www.societegenerale.com
2 BEAR AXP UN 3X VON 3X VON Vontobel www.vontobel.com
3 BULL AXP UN x5 VON x5 VON Vontobel www.vontobel.com
4 BEAR DAX X3 SG 2 SG 2 SEG www.societegenerale.com
答案 1 :(得分:0)
我相信你也可以通过这种方式解决它:
使用df.issuer.str.endswith而不是使用df.issuer.str.contains。
for w in issuer:
df.loc[df.issuer.str.endswith(w).fillna(False), "website"] = issuer[w]