我正在尝试从一个矩阵中搜索并在第二个矩阵上替换该值。
ds1 = [[ 4, 13, 6, 9],
[ 7, 12, 5, 7],
[ 7, 0, 4, 22],
[ 9, 8, 12, 0]]
ds2 = [[ 4, 1],
[ 5, 3],
[ 6, 1],
[ 7, 2],
[ 8, 2],
[ 9, 3],
[12, 1],
[13, 2],
[22, 3]]
output = [[1, 2, 1, 3],
[2, 1, 3, 2],
[2, 0, 1, 3],
[3, 2, 1, 0]]
以下是代码:
out = ds1.copy()
_,C = np.where(ds1.ravel()[:,None] == ds2[:,0])
newvals = ds2[C,1]
valid = np.in1d(ds1.ravel(),ds2[:,0])
out.ravel()[valid] = newvals
output
是将ds2键值替换为ds1中的索引值的结果。
我用实际的矩阵值做了同样的事情
ds1 = pd.read_table('https://gist.githubusercontent.com/karimkhanp/9527bad750fbe75e072c/raw/ds1', sep=' ', header=None)
ds2 = pd.read_table('https://gist.githubusercontent.com/karimkhanp/1692f1f76718c35e939f/raw/6f6b348ab0879b702e1c3c5e362e9d2062e9e9bc/ds2', header=None, sep=' ')
所以我得到了
_,C = np.where(ds1.ravel()[:,None] == ds2[:,0])
File "/usr/local/lib/python2.7/dist-packages/pandas/core/generic.py", line 1947, in __getattr__
(type(self).__name__, name))
AttributeError: 'DataFrame' object has no attribute 'ravel'
我也试过转换为numpy数组
ds1 = np.array(ds1)
ds2 = np.array(ds2)
_,C = np.where(ds1.values.ravel()[:,None] == ds2.values[:,0])
所以它给了:
AttributeError Traceback (most recent call last)<ipython-input-39-6a80d7cd7f81> in <module>()----> 1 _,C = np.where(ds1.values.ravel()[:,None] == ds2.values[:,0])AttributeError: 'numpy.ndarray' object has no attribute 'values'
任何建议或帮助非常感谢
答案 0 :(得分:2)
values
是pandas DataFrame
的成员,而不是numpy ndarray
。因此,在第二种方法中,不要将ds转换为numpy数组。只需删除这两行
ds1 = np.array(ds1)
ds2 = np.array(ds2)
和
_,C = np.where(ds1.values.ravel()[:,None] == ds2.values[:,0])
应该工作。
-----------------这是我机器上的测试-------------------
我的脚本是
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import sys
import pandas as pd
import numpy as np
ds1 = pd.read_table('https://gist.githubusercontent.com/karimkhanp/9527bad750fbe75e072c/raw/ds1', sep=' ', header=None)
ds2 = pd.read_table('https://gist.githubusercontent.com/karimkhanp/1692f1f76718c35e939f/raw/6f6b348ab0879b702e1c3c5e362e9d2062e9e9bc/ds2', header=None, sep=' ')
print ds1.shape, ds2.shape
_,C = np.where(ds1.values.ravel()[:,None] == ds2.values[:,0])
print C
,输出
(1000, 1001) (4000, 2)
[ 10 35 60 ..., 3869 3938 3987]
我的环境是cygwin和python 2.7.9。