Question

我正在尝试在数据框中生成一个新列，该列等于当前行索引减去用户最初选择的行的索引。假设我们有这个数据框：

  #include <iostream>
#include<stdlib.h>
using namespace std;

struct point
{
int x;
int y;
};
void distinctPoints(point arr[], int size)
{
cout<<"Repeated Points"<<endl;
    cout<<"x, y"<<endl;
  for(int i = 0; i< size; i++)
    for(int j = i+1; j< size; j++)
        {
        if ((arr[i].x==arr[j].x) && (arr[i].y==arr[j].y))
            {
            cout<<arr[j].x <<", "<<arr[j].y<<endl;
            break;
            }
        }
}
int main()
{   int size=10;
    point points[size]={{3,5},{4,2},{2,4},{3,5},{7,8},{7,8},{4,2},{7,8},{3,5},{2,4}};
    distinctPoints(points, size);
    return 0;
}

我们的用户选择了第1行。我希望C列的值为：

     A    B C
0  foo  bar  
1  bar  foo  
2  foo  bar

我已经知道这可以通过使用以下内容迭代数据框来实现：

    A    B   C
0  foo  bar -1
1  bar  foo  0
2  foo  bar  1

然而这很慢。实际上很慢，它不起作用。

我的问题是，如何使用df.apply来加快速度？我如何将当前行的索引传递给我正在应用的函数？我想做点什么：

for index,row in df.iterrows():
    df['C'].loc[index] = index - USER_SELECTED_INDEX

Answer 1

也许试试这个。

df.assign(C=df.index-1)
Out[28]: 
     A    B  C
0  foo  bar -1
1  bar  foo  0
2  foo  bar  1

Answer 2

找到了我想要的答案。对于那些感兴趣的人：

def applyCol(row):
    return row.name - USER_SELECTED_INDEX #row.name resolves to the index

df['C'] = df.apply(applyCol, axis=1)

快乐的编码！

Pandas数据框应用函数以基于所选行创建新列

2 个答案: