Question

我有一个这样的数据框：

df = pd.DataFrame({'c1': list('aba'), 'c2': list('aaa'), 'ignore_me': list('bbb'), 'c3': list('baa')})

  c1 c2 ignore_me c3
0  a  a         b  b
1  b  a         b  a
2  a  a         b  a

和一个像这样的字典

d = {'a': "foo", 'b': 'bar'}

我现在想将map的值d到与regex ^c\d+$匹配的列。

我能做

df.filter(regex='^c\d+$').apply(lambda x: x.map(d))

    c1   c2   c3
0  foo  foo  bar
1  bar  foo  foo
2  foo  foo  foo

但是，然后缺少与正则表达式不匹配的所有列。

因此，我可以这样做：

tempdf = df.filter(regex='^c\d+$')

df.loc[:, tempdf.columns] = tempdf.apply(lambda x: x.map(d))

给出所需的输出

    c1   c2 ignore_me   c3
0  foo  foo         b  bar
1  bar  foo         b  foo
2  foo  foo         b  foo

有没有更聪明的解决方案来避免临时数据帧？

Answer 1

绝对有，请使用str.contains。

df.columns.str.contains(r'^c\d+$') # use raw strings, it's good hygene
# array([ True,  True, False,  True])

将掩码传递到loc：

df.loc[:, df.columns.str.contains(r'^c\d+$')] = df.apply(lambda x: x.map(d))

如果您想提高效率，

m = df.columns.str.contains(r'^c\d+$')
df.loc[:, m] = df.loc[:, m].apply(lambda x: x.map(d))

df

    c1   c2 ignore_me   c3
0  foo  foo  b         bar
1  bar  foo  b         foo
2  foo  foo  b         foo

Answer 2

也许不是最聪明的方法，但我认为这很整洁……：

# Your code
df = pd.DataFrame({'c1': list('aba'), 'c2': list('aaa'), 'ignore_me': list('bbb'), 'c3': list('baa')})
d = {'a': "foo", 'b': 'bar'}

# Solution
import re # cs95 provided a better solution to pick columns!

# Pre-compile the regex object in case there is a huge list of columns....
regex = re.compile(r'^c\d+$')

# Python 3's `filter` returns a `generator`, add a `list` wrapper to get the columns 
cols = list(filter(regex.search, df.columns))
# output ['c1', 'c2', 'c3']


# PICK one of the following...:

# The normal way
df[cols] = df[cols].apply(lambda x: x.map(d))

# OR use `applymap`
df[cols] = df[cols].applymap(lambda x: d[x])

# OR if you prefer not to see `lambda` at all!
df[cols] = df[cols].applymap(d.get)

df

Answer 3

尝试更换？

df.filter(regex='^c\d+$').apply(lambda x: x.replace(d))

您也许还会发现np.where对于此类过滤问题很有用。

如何将值映射到位？

3 个答案: