如何根据列名的前三个字符来更改列名

时间:2018-09-02 17:40:31

标签: python dataframe

我想使用字典根据列名的前三个字符来更改列名。

这是我当前拥有的代码:

new_names = {"aud":"alc_aud","whe":"clu_whe", "per":"pre_per",
                "pol":"cou_pol","spec":"coc_spec","dark":"daw_dark"}

for x,y in new_names.items():
    if df.columns.str.startswith(x):
       df.columns = df.columns.str.replace(x,y)

我收到以下错误:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

1 个答案:

答案 0 :(得分:1)

使用:

df = pd.DataFrame({'aud1':list('abcdef'),
                   'spe2':[4,5,4,5,5,4],
                   'C':[7,8,9,4,2,3],
                   'F':list('aaabbb')})

print (df)
  aud1   spe2  C  F
0    a      4  7  a
1    b      5  8  a
2    c      4  9  a
3    d      5  4  b
4    e      5  2  b
5    f      4  3  b

new_names = {"aud":"alc_aud","whe":"clu_whe", "per":"pre_per",
                "pol":"cou_pol","spec":"coc_spec","dark":"daw_dark"}

首先过滤字典的前3个值:

new_names = {k[:3] :v for k, v in new_names.items()}

print (new_names)
{'aud': 'alc_aud', 'whe': 'clu_whe', 'per': 'pre_per', 
     'pol': 'cou_pol', 'spe': 'coc_spec', 'dar': 'daw_dark'}

然后通过索引str[:3]选择前3个字母,然后按dict索引replace

df.columns = df.columns.to_series().str[:3].replace(new_names)
print (df)
  alc_aud  coc_spec  C  F
0       a         4  7  a
1       b         5  8  a
2       c         4  9  a
3       d         5  4  b
4       e         5  2  b
5       f         4  3  b

另一个getlist comprehension的解决方案,如果值不匹配,则返回原始值:

df.columns = [new_names.get(x[:3], x) for x in df.columns]
print (df)
  alc_aud  coc_spec  C  F
0       a         4  7  a
1       b         5  8  a
2       c         4  9  a
3       d         5  4  b
4       e         5  2  b
5       f         4  3  b

编辑:解决方案使用任意长度的字符串

df = pd.DataFrame({'aud1':list('abcdef'),
                   'specd2':[4,5,4,5,5,4],
                   'podfds':[7,8,9,4,2,3],
                   'aaper':list('aaabbb')})

print (df)
  aud1  specd2  podfds aaper
0    a       4       7     a
1    b       5       8     a
2    c       4       9     a
3    d       5       4     b
4    e       5       2     b
5    f       4       3     b

new_names = {"aud":"alc_aud","whe":"clu_whe", "per":"pre_per",
                "po":"cou_pol","spec":"coc_spec","dark":"daw_dark"}

首先extract用字典键开头的所有值,然后用map,最后用fillna填充不匹配的值:

pat = '|'.join([r'^{}'.format(x) for x in new_names])
s  = df.columns.to_series()
df.columns = s.str.extract('('+ pat + ')', expand=False).map(new_names).fillna(s)
print (df)
  alc_aud  coc_spec  cou_pol aaper
0       a         4        7     a
1       b         5        8     a
2       c         4        9     a
3       d         5        4     b
4       e         5        2     b
5       f         4        3     b