我想使用字典根据列名的前三个字符来更改列名。
这是我当前拥有的代码:
new_names = {"aud":"alc_aud","whe":"clu_whe", "per":"pre_per",
"pol":"cou_pol","spec":"coc_spec","dark":"daw_dark"}
for x,y in new_names.items():
if df.columns.str.startswith(x):
df.columns = df.columns.str.replace(x,y)
我收到以下错误:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
答案 0 :(得分:1)
使用:
df = pd.DataFrame({'aud1':list('abcdef'),
'spe2':[4,5,4,5,5,4],
'C':[7,8,9,4,2,3],
'F':list('aaabbb')})
print (df)
aud1 spe2 C F
0 a 4 7 a
1 b 5 8 a
2 c 4 9 a
3 d 5 4 b
4 e 5 2 b
5 f 4 3 b
new_names = {"aud":"alc_aud","whe":"clu_whe", "per":"pre_per",
"pol":"cou_pol","spec":"coc_spec","dark":"daw_dark"}
首先过滤字典的前3个值:
new_names = {k[:3] :v for k, v in new_names.items()}
print (new_names)
{'aud': 'alc_aud', 'whe': 'clu_whe', 'per': 'pre_per',
'pol': 'cou_pol', 'spe': 'coc_spec', 'dar': 'daw_dark'}
然后通过索引str[:3]
选择前3个字母,然后按dict
索引replace
:
df.columns = df.columns.to_series().str[:3].replace(new_names)
print (df)
alc_aud coc_spec C F
0 a 4 7 a
1 b 5 8 a
2 c 4 9 a
3 d 5 4 b
4 e 5 2 b
5 f 4 3 b
另一个get
和list comprehension
的解决方案,如果值不匹配,则返回原始值:
df.columns = [new_names.get(x[:3], x) for x in df.columns]
print (df)
alc_aud coc_spec C F
0 a 4 7 a
1 b 5 8 a
2 c 4 9 a
3 d 5 4 b
4 e 5 2 b
5 f 4 3 b
编辑:解决方案使用任意长度的字符串
df = pd.DataFrame({'aud1':list('abcdef'),
'specd2':[4,5,4,5,5,4],
'podfds':[7,8,9,4,2,3],
'aaper':list('aaabbb')})
print (df)
aud1 specd2 podfds aaper
0 a 4 7 a
1 b 5 8 a
2 c 4 9 a
3 d 5 4 b
4 e 5 2 b
5 f 4 3 b
new_names = {"aud":"alc_aud","whe":"clu_whe", "per":"pre_per",
"po":"cou_pol","spec":"coc_spec","dark":"daw_dark"}
首先extract
用字典键开头的所有值,然后用map
,最后用fillna
填充不匹配的值:
pat = '|'.join([r'^{}'.format(x) for x in new_names])
s = df.columns.to_series()
df.columns = s.str.extract('('+ pat + ')', expand=False).map(new_names).fillna(s)
print (df)
alc_aud coc_spec cou_pol aaper
0 a 4 7 a
1 b 5 8 a
2 c 4 9 a
3 d 5 4 b
4 e 5 2 b
5 f 4 3 b