我有一个简单的代码来查找数据集中的类似行。
h=0
count=0
#227690
deletedIndexes=np.zeros((143,))
len(data)
for i in np.arange(len(data)):
if(data[i-1,2]==data[i,2]):
similarIndexes[h]=int(i)
h=h+1
count=count+1
print("similar found in -->", i," there are--->", count)
当数据是numpy.ndarray时它可以正常工作但是如果数据是熊猫对象,我会给出以下错误:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 7, in smilarData
File "/usr/lib/python2.7/dist-packages/pandas/core/frame.py", line 1658, in __getitem__
return self._getitem_column(key)
File "/usr/lib/python2.7/dist-packages/pandas/core/frame.py", line 1665, in _getitem_column
返回self._get_item_cache(键)
File "/usr/lib/python2.7/dist-packages/pandas/core/generic.py", line 1005, in _get_item_cache
values = self._data.get(item)
File "/usr/lib/python2.7/dist-packages/pandas/core/internals.py", line 2874, in get
_, block = self._find_block(item)
File "/usr/lib/python2.7/dist-packages/pandas/core/internals.py", line 3186, in _find_block
self._check_have(item)
File "/usr/lib/python2.7/dist-packages/pandas/core/internals.py", line 3193, in _check_have
raise KeyError('no item named %s' % com.pprint_thing(item))
KeyError: u'no item named (-1, 2)'
我该怎么做才能使用这段代码?如果将pandas对象转换为numpy数组很有帮助,我该怎么做?
答案 0 :(得分:1)
将pandas数据帧转换为numpy数组:
import numpy as np
np.array(dataFrame)
答案 1 :(得分:1)
我还不能评论Adrienne的回答所以我想补充一点,数据帧有内置方法将df转换为数组即矩阵
>>> df = pd.DataFrame({"a":range(5),"b":range(5,10)})
>>> df
a b
0 0 5
1 1 6
2 2 7
3 3 8
4 4 9
>>> mat = df.as_matrix()
array([[0, 5],
[1, 6],
[2, 7],
[3, 8],
[4, 9]])
>>>col = [x[0] for x in mat] # to get certain columns
>>> col
[0, 1, 2, 3, 4]
还可以找到可以执行的重复行:
>>> df2
a b
0 0 5
1 1 6
2 2 7
3 3 8
4 4 9
5 0 5
>>> df2[df2.duplicated()]
a b
5 0 5
答案 2 :(得分:0)
我订阅了之前的答案,但如果您想直接使用pandas
个对象,访问DataFrame项目有其特殊的方法。在您的代码中,您应该说例如。
if(data.iloc[i-1,2]==data.iloc[i,2]):
请参阅documentation了解更多