鉴于数据框df
和df2
:
>>> df = pd.DataFrame([[1,'a','b'], [1, 'c', 'd'],
[2, 'c', 'd'], [1, 'f', 'o'],
[2, 'b', 'a']], columns=['x', 'y', 'z'])
>>> df2 = pd.DataFrame([[1, 'apple'], [2, 'orange'],
[3, 'pear']], columns=['x', 'fruit'])
>>> df
x y z
0 1 a b
1 1 c d
2 2 c d
3 1 f o
4 2 b a
>>> df2
x fruit
0 1 apple
1 2 orange
2 3 pear
如何根据共享的fruit
列创建包含x
列值的新列?
期望的输出:
>>> df
x y z fruit
0 1 a b apple
1 1 c d apple
2 2 c d orange
3 1 f o apple
4 2 b a orange
我试过这个,它有效,但我确信有一个更简单的方法可以做到这一点:
>>> df['fruit'] = [list(df2[df2['x'] == row['x']]['fruit'])[0] for idx, row in df.iterrows()]
>>> df
x y z fruit
0 1 a b apple
1 1 c d apple
2 2 c d orange
3 1 f o apple
4 2 b a orange
请注意,上面的Dataframe是未编入索引的。如果数据框已编入索引,则尝试的方法将无效:
>>> df = df.set_index('x')
>>> df2 = df2.set_index('x')
>>> df
y z fruit
x
1 a b apple
1 c d apple
2 c d orange
1 f o apple
2 b a orange
>>> df2
fruit
x
1 apple
2 orange
3 pear
>>> df['fruit'] = [list(df2[df2['x'] == row['x']]['fruit'])[0] for idx, row in df.iterrows()]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/site-packages/pandas/core/frame.py", line 2062, in __getitem__
return self._getitem_column(key)
File "/usr/local/lib/python2.7/site-packages/pandas/core/frame.py", line 2069, in _getitem_column
return self._get_item_cache(key)
File "/usr/local/lib/python2.7/site-packages/pandas/core/generic.py", line 1534, in _get_item_cache
values = self._data.get(item)
File "/usr/local/lib/python2.7/site-packages/pandas/core/internals.py", line 3590, in get
loc = self.items.get_loc(item)
File "/usr/local/lib/python2.7/site-packages/pandas/core/indexes/base.py", line 2395, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/_libs/index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc (pandas/_libs/index.c:5239)
File "pandas/_libs/index.pyx", line 154, in pandas._libs.index.IndexEngine.get_loc (pandas/_libs/index.c:5085)
File "pandas/_libs/hashtable_class_helper.pxi", line 1207, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas/_libs/hashtable.c:20405)
File "pandas/_libs/hashtable_class_helper.pxi", line 1215, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas/_libs/hashtable.c:20359)
KeyError: 'x'
答案 0 :(得分:3)
使用merge
:
df.merge(df2, on='x')
输出:
x y z fruit
0 1 a b apple
1 1 c d apple
2 1 f o apple
3 2 c d orange
4 2 b a orange
答案 1 :(得分:3)
或使用map
df = pd.DataFrame([[1,'a','b'], [1, 'c', 'd'],
[2, 'c', 'd'], [1, 'f', 'o'],
[2, 'b', 'a']], columns=['x', 'y', 'z'])
df2 = pd.DataFrame([[1, 'apple'], [2, 'orange'],
[3, 'pear']], columns=['x', 'fruit'])
df['fruit']=df.x.map(df2.set_index('x').fruit)
df
Out[257]:
x y z fruit
0 1 a b apple
1 1 c d apple
2 2 c d orange
3 1 f o apple
4 2 b a orange
假设您已完成set_index()
按索引合并,那么〜
df = df.set_index('x')
df2 = df2.set_index('x')
df.merge(df2,left_index=True,right_index=True)
Out[260]:
y z fruit
x
1 a b apple
1 c d apple
1 f o apple
2 c d orange
2 b a orange
答案 2 :(得分:1)
完整性
df.join(df2.set_index('x'), on='x')
x y z fruit
0 1 a b apple
1 1 c d apple
2 2 c d orange
3 1 f o apple
4 2 b a orange