如何用Pandas数据框中的列值替换单元格中的索引值

时间:2018-07-19 06:26:14

标签: python pandas knn

我有一个具有唯一ID和少量属性的数据集。我在Python中执行了k-d树,以获取三个最近邻居的每个id的索引,如下图所示: enter image description here

上图中的“索引”是Pandas数据框随附的默认索引。我希望输出的格式如下图所示: enter image description here

这可以使用vlookup在excel中轻松完成,但是如何在Python中做到这一点?

4 个答案:

答案 0 :(得分:2)

Serie s中使用replace

df = df.replace(df['id'])
#or convert to dict (first solution)
#df = df.replace(df['id'].to_dict())
print (df)
   id neighborl neighbor2 neighbor3
0  u1        u1        u4        u3
1  u2        u2        u3        u2
2  u3        u3        u1        u2
3  u4        u4        u1        u2

另一种解决方案:

cols = ['neighbor1', 'neighbor2', 'neighbor3']
df[cols] = df[cols].applymap(df['id'].to_dict().get)
print (df)
   id neighbor1 neighbor2 neighbor3
0  u1        u1        u4        u3
1  u2        u2        u3        u2
2  u3        u3        u1        u2
3  u4        u4        u1        u2

如果需要更多动态解决方案:

#select columns starting by neighbor
cols = df.filter(regex='^neighbor').columns
print (cols)
Index(['neighbor1', 'neighbor2', 'neighbor3'], dtype='object')

df[cols] = df[cols].replace(df['id'])
print (df)
   id neighbor1 neighbor2 neighbor3
0  u1        u1        u4        u3
1  u2        u2        u3        u2
2  u3        u3        u1        u2
3  u4        u4        u1        u2

#create mask by columns names starting by neighbor
mask = df.columns.str.startswith('neighbor')
print (mask)
[False  True  True  True]

df.loc[:, mask] = df.loc[:, mask].replace(df['id'])
print (df)
   id neighbor1 neighbor2 neighbor3
0  u1        u1        u4        u3
1  u2        u2        u3        u2
2  u3        u3        u1        u2
3  u4        u4        u1        u2

答案 1 :(得分:1)

使用

In [289]: cols = ['neighbor1', 'neighbor2', 'neighbor3']

In [290]: df[cols].replace(df.set_index('index')['id'].to_dict())
Out[290]:
  neighbor1 neighbor2 neighbor3
0        u1        u4        u3
1        u2        u3        u2
2        u3        u1        u2
3        u4        u1        u2

In [291]: df[cols] = df[cols].replace(df.set_index('index')['id'].to_dict())

In [292]: df
Out[292]:
   index  id neighbor1 neighbor2 neighbor3
0      0  u1        u1        u4        u3
1      1  u2        u2        u3        u2
2      2  u3        u3        u1        u2
3      3  u4        u4        u1        u2

答案 2 :(得分:1)

尝试一下,

print df.replace(df['id'].to_dict())

输入:

       id  neighbor1  neighbor2  neighbor3
index                                     
0      u1          0          3          2
1      u2          1          2          1
2      u3          2          0          1
3      u4          3          0          1

输出:

       id neighbor1 neighbor2 neighbor3
index                                  
0      u1        u1        u4        u3
1      u2        u2        u3        u2
2      u3        u3        u1        u2
3      u4        u4        u1        u2

答案 3 :(得分:1)

import numpy as np 
import matplotlib.pyplot as plt 
import sympy                        # for evaluating number of primes <= n 

def f(n):
    arr = []
    for i in range(1,n+1):
        arr.append(sympy.primepi(i))
        #print('For',i, 'value', arr[i-1])
    return arr

ar = f(100)

t1 = np.arange(1,101,1,dtype = int)
plt.plot(t1, ar ,'bo')           # instead of 'bo' what I need to use to make it like 1st picture?
plt.axis([0,110,0,25]) 
plt.show()