我的数据框users
包含不同的列。我的目标是添加列[uses_name
],当密码与每个用户的名字或姓氏相同时,该列应为True
。
例如,十二行中的[user_name
]包含milford.hubbard
。然后在{uses_name
]中将True
,因为[password
]和[last_name
]是相同的。
为此,我使用正则表达式创建了两列[first_name
]和[last_name
]。创建[uses_name
]时,我遇到|
运算符问题。我在pandas doc中更多地了解布尔索引,但没有找到答案。
我的代码:
import pandas as pd
users = pd.read_csv('datasets/users.csv')
# Extracting first and last names into their own columns
users['first_name'] = users['user_name'].str.extract(r'(^\w+)', expand=False)
users['last_name'] = users['user_name'].str.extract(r'(\w+$)', expand=False)
# Flagging the users with passwords that matches their names
users['uses_name'] = users['password'].isin(users['first_name'] | users['last_name'])
# Counting and printing the number of users using names as passwords
print(users['uses_name'].count())
# Taking a look at the 12 first rows
print(users.head(12))
当我尝试编译时,我给出了一个错误:
TypeError: unsupported operand type(s) for |: 'str' and 'bool'
users
数据框中的前12行,其中包含已创建的first_name
和last_name
列:
id user_name password first_name last_name
0 1 vance.jennings joobheco vance jennings
1 2 consuelo.eaton 0869347314 consuelo eaton
2 3 mitchel.perkins fabypotter mitchel perkins
3 4 odessa.vaughan aharney88 odessa vaughan
2 3 mitchel.perkins fabypotter mitchel perkins
3 4 odessa.vaughan aharney88 odessa vaughan
4 5 araceli.wilder acecdn3000 araceli wilder
5 6 shawn.harrington 5278049 shawn harrington
6 7 evelyn.gay master evelyn gay
7 8 noreen.hale murphy noreen hale
8 9 gladys.ward lwsves2 gladys ward
9 10 brant.zimmerman 1190KAREN5572497 brant zimmerman
10 11 leanna.abbott aivlys24 leanna abbott
11 12 milford.hubbard hubbard milford hubbard
答案 0 :(得分:3)
你可以连续,因为两者都是系列
users['password'].isin(pd.concat([users['first_name'],users['last_name']]))
由于您更改了问题,请更新一个
df[['first_name','last_name']].eq(df.password,axis=0).any(1)
答案 1 :(得分:3)
这有效: users ['uses_name'] =(users ['password'] == users ['first_name'])| (users ['password'] == users ['last_name'])
答案 2 :(得分:2)
val = np.union1d(users['first_name'], users['last_name'])
users['uses_name'] = users['password'].isin(val)
print (users)
id user_name password first_name last_name uses_name
0 1 vance.jennings joobheco vance jennings False
1 2 consuelo.eaton 0869347314 consuelo eaton False
2 3 mitchel.perkins fabypotter mitchel perkins False
3 4 odessa.vaughan aharney88 odessa vaughan False
2 3 mitchel.perkins fabypotter mitchel perkins False
3 4 odessa.vaughan aharney88 odessa vaughan False
4 5 araceli.wilder acecdn3000 araceli wilder False
5 6 shawn.harrington 5278049 shawn harrington False
6 7 evelyn.gay master evelyn gay False
7 8 noreen.hale murphy noreen hale False
8 9 gladys.ward lwsves2 gladys ward False
9 10 brant.zimmerman 1190KAREN5572497 brant zimmerman False
10 11 leanna.abbott aivlys24 leanna abbott False
11 12 milford.hubbard hubbard milford hubbard True
答案 3 :(得分:1)
我认为最好的方法是执行set
联合并将其传递给isin
:
users['uses_name'] = users['password'].isin(
set(users['first_name']).union(users['last_name'])
)
users
id user_name password first_name last_name uses_name
0 1 vance.jennings joobheco vance jennings False
1 2 consuelo.eaton 0869347314 consuelo eaton False
2 3 mitchel.perkins fabypotter mitchel perkins False
3 4 odessa.vaughan aharney88 odessa vaughan False
2 3 mitchel.perkins fabypotter mitchel perkins False
3 4 odessa.vaughan aharney88 odessa vaughan False
4 5 araceli.wilder acecdn3000 araceli wilder False
5 6 shawn.harrington 5278049 shawn harrington False
6 7 evelyn.gay master evelyn gay False
7 8 noreen.hale murphy noreen hale False
8 9 gladys.ward lwsves2 gladys ward False
9 10 brant.zimmerman 1190KAREN5572497 brant zimmerman False
10 11 leanna.abbott aivlys24 leanna abbott False
11 12 milford.hubbard hubbard milford hubbard True
请注意|
是逻辑OR,它对pandas中的字符串列没有意义。