Question

我有一个大型数据框，用作整数和名称之间的映射：

from StringIO import StringIO
import pandas as pd

gene_int_map = pd.read_table(StringIO("""Gene       Int
Mt-nd1   2
Cers2   4
Nampt   10
Madd    20
Zmiz1   21
Syt1        26
Syt5    30
Syt7        32
Cdca7   34
Ablim2  42
Elp5    43
Clic1   98
Ece2    100"""), sep="\s+")

然后我有另一个数据框，我想将Gene列转换为地图中给出的int（to_convert中的名称可以被覆盖）：

to_convert = pd.read_table(StringIO("""Gene    Term
Mt-nd1  GO:0005739
Mt-nd1  GO:0005743
Mt-nd1  GO:0016021
Mt-nd1  GO:0030425
Mt-nd1  GO:0043025
Mt-nd1  GO:0070469
Mt-nd1  GO:0005623
Mt-nd1  GO:0005622
Mt-nd1  GO:0005737
Madd    GO:0016021
Madd    GO:0045202
Madd    GO:0005886
Zmiz1   GO:0005654
Zmiz1   GO:0043231
Cdca7   GO:0005622
Cdca7   GO:0005623
Cdca7   GO:0005737
Cdca7   GO:0005634
Cdca7   GO:0005654"""), sep="\s+")

就像我说的那样，我要做的是用to_convert中的整数值替换gene_int_map中的名称。

我确定这非常简单，但似乎没有合并选项的排列可以做到。我也无法使用任何布尔掩码。

聚苯乙烯。我还想用gene_int_map中的整数替换单列数据框中的值：

simple_series = pd.read_table(StringIO("""Gene
Ablim2
Elp5
Clic1
Ece2"""), squeeze=False)

如果答案足够通用以包括这种情况，那将是很好的。

Answer 1

致电set_index上的基因＆＃39; gene_int_map中的列，并将其作为参数传递给map，并在您的基因＆＃39;上调用此列。你的另一个df上的专栏：

In [119]:
to_convert['Gene'].map(gene_int_map.set_index('Gene')['Int'])

Out[119]:
0      2
1      2
2      2
3      2
4      2
5      2
6      2
7      2
8      2
9     20
10    20
11    20
12    21
13    21
14    34
15    34
16    34
17    34
18    34
Name: Gene, dtype: int64

这也适用于您的simple_series：

In [120]:
simple_series['Gene'].map(gene_int_map.set_index('Gene')['Int'])

Out[120]:
0     42
1     43
2     98
3    100
Name: Gene, dtype: int64

Answer 2

也许你可以创建一个字典，如：

dictionary = dict(zip(gene_int_map.Gene, gene_int_map.Int))

然后根据@EdChum的建议替换值（使用map）：

to_convert['Gene'].map(dictionary)

之前创建的字典将加快映射速度。

如何使用dataframe作为map来更改另一个数据帧中的值

2 个答案: