Question

我有df一列（acount_no）包含空字符串，空格字符串和重复数字字符串作为其值。对于这些值，我想创建一个新列（valid_account_no）并将False设置为相应的行。此外，如果任何account_no的长度为<= 4，则会将False设置为valid_account_no。 df看起来像，

 id    account_no    valid_account_no
 1                   False
 2     999999        False
 3     1234          False
 4     123456        True

这是我的代码，

# sets boolean values of column valid account no when account no is spaced or repeated number
df['valid_account_no'] = df['account_no'].str.match(r"\b(\d)\1+\b| +")

# if length of any account nos are <= 4 or the account nos are empty
# set values of column valid account no to False
invalid_account_indices = df[(df['account_no'].str.len() <= 4) |
                             (df['account_no'] == '')].index
df.loc[invalid_account_indices, 'valid_account_no'] = False

我想知道是否有更好的方法来实现这一目标，从某种意义上说，使其更加简洁高效。

Answer 1

你的方法本身如果好，另一个解决方法是使用简单的布尔代数，即

m1 = df['account_no'].str.match(r"\b(\d)\1+\b| +")
m2 = df['account_no'].str.len()<=4
m3 = df['account_no'] == ''
df['valid_account_no'] = (m1) | (m2 & m3)

pandas使用正则表达式匹配基于另一列的值设置列的布尔值

1 个答案: