我有以下数据框:
# in server.js, within default loopback boot function
boot(app, __dirname, function(err) {
if (err) throw err;
// start the server if `$ node server.js`
if (require.main === module) {
if (+process.env.START_WORKERS) {
require('./workers/start');
return;
} else {
app.start();
}
}
});
第一列包含user_id,每一行代表他所做的一个动作。每个user_id都显示在“Actor1”或“Actor2”列中。
首先,我想创建一个新列,如果在“Actor1”列中找到user_id,则将分配值1,否则为0。
其次,我想创建一个新列,对于每个user_id,它将存储与之交互的“Actor”_i值。
对于上面的示例,输出将如下所示:
data = [
(27450, 27450, 29420,"10/10/2016"),
(29420 , 36142, 29420, "10/10/2016"),
(11 , 11, 27450, "10/10/2016")]
#Create DataFrame base
df = pd.DataFrame(data, columns=("User_id","Actor1","Actor2", "Time"))
最有效的pythonic方法是什么?
提前多多感谢!
答案 0 :(得分:2)
import numpy as np
import pandas as pd
data = [(27450, 27450, 29420,"10/10/2016"),
(29420 , 36142, 29420, "10/10/2016"),
(11 , 11, 27450, "10/10/2016")]
df = pd.DataFrame(data, columns=("User_id","Actor1","Actor2", "Time"))
mask = (df['User_id'] == df['Actor1'])
df['first actor'] = mask.astype(int)
df['other actor'] = np.where(mask, df['Actor2'], df['Actor1'])
print(df)
产量
User_id Actor1 Actor2 Time first actor other actor
0 27450 27450 29420 10/10/2016 1 29420
1 29420 36142 29420 10/10/2016 0 36142
2 11 11 27450 10/10/2016 1 27450
首先创建一个布尔掩码,当User_id
等于Actor1
时,该掩码为True:
In [51]: mask = (df['User_id'] == df['Actor1']); mask
Out[51]:
0 True
1 False
2 True
dtype: bool
将mask
转换为ints会创建第一列:
In [52]: mask.astype(int)
Out[52]:
0 1
1 0
2 1
dtype: int64
然后使用np.where
在两个值之间进行选择。如果np.where(mask, A, B)
为True,则ith
会返回A[i]
值为mask[i]
的数组,否则为B[i]
。从而,
np.where(mask, df['Actor2'], df['Actor1'])
取Actor2
mask
为真的值,Actor1
的值为:
In [53]: np.where(mask, df['Actor2'], df['Actor1'])
Out[53]: array([29420, 36142, 27450])
答案 1 :(得分:0)
继承我的解决方案 - 我假设如果userid出现在actor1列中,那么它就不会在同一行......
df["Col1"] = [1 if i in df["Actor1"].values else 0 for i in df["User_id"].values]
df["Col2"] = [df.iloc[i]["Actor2"] if j == 1 else df.iloc[i]["Actor1"] for i, j in enumerate(df["Col1"].values)]
输出 -
User_id Actor1 Actor2 Time Col1 Col2
0 27450 27450 29420 10/10/2016 1 29420
1 29420 36142 29420 10/10/2016 0 36142
2 11 11 27450 10/10/2016 1 27450