问题

Question

我一直在尝试编写一个函数，该函数根据列中字符串的长度插入空格。

我有一个包含两列的数据框，一列包含邮政编码，另一列包含这些邮政编码的长度。如下所示：

current_list

我想插入一个空格，以使列变为

    new_pstl_cd length
  1 SS55HA       6
  2 BD108EG      7
  3 LS15HU       6
  4 W19PX        5

我尝试了以下代码而未成功：

   new_pstl_cd  
 1 SS5 5HA
 2 BD10 8EG         
 3 LS1 5HU
 4 W1 9PX

我如何使用长度编写函数来做到这一点？请注意，在这种情况下，它将始终是最后3个字符被分隔

Answer 1

您甚至不需要length：

df['new_pstl_cd'] = df.new_pstl_cd.str[:-3] + ' ' + df.new_pstl_cd.str[-3:]

输出：

  new_pstl_cd  length
1     SS5 5HA       6
2    BD10 8EG       7
3     LS1 5HU       6
4      W1 9PX       5

Answer 2

问题

您尝试过的这段if / elif代码反映了对熊猫如何工作的普遍误解。

if仅接受单个布尔值，即True或False。 postcode_test['length'] == 6是Pandas系列对象，即许多布尔值的集合。正如您肯定会看到的错误消息所解释的那样，将其用作if的输入甚至没有任何意义。

您需要找出一种将操作应用于某些行而不是其他行的方法，但是您不能使用if。熊猫实际上提供了几种方式来实现这一目标。

可能的解决方案

与往常一样，请勿复制和粘贴您不了解的代码。请参阅library documentation以获得帮助。此代码未试用；它可能包含错误，并且不附带保修。

布尔子集

最简单的方法可能是布尔设置，这或多或少是您最初尝试执行的操作。

postcode_len_6 = postcode_test['length'] == 6
postcode_len_5 = postcode_test['length'] == 5

postcode_len_6和postcode_len_5是带有布尔元素的Series对象。它们的元素对应于postcode_test['length']系列的元素，而它们的元素又对应于postcode_test数据帧的行。

您可以按照indexing and selecting data guide使用以下Series对象：

postcode_test.loc[postcode_len_6, 'new_pstl_cd'] = \
    postcode_test.loc[postcode_len_6, 'new_pstl_cd'].str[:3] + ' '+ \
    postcode_test.loc[postcode_len_6, 'new_pstl_cd'].str[3:]

postcode_test.loc[postcode_len_5, 'new_pstl_cd'] = \
    postcode_test.loc[postcode_len_5, 'new_pstl_cd'].str[:2] + ' '+ \
    postcode_test.loc[postcode_len_5, 'new_pstl_cd'].str[2:]

使用“遮罩”方法

有时这可能是不直观的，因此请确保您已阅读user's guide和API documentation。

和以前一样，首先查找长度为5或6的行：

postcode_len_6 = postcode_test['length'] == 6
postcode_len_5 = postcode_test['length'] == 5

但是，您可以使用.loc来代替.mask的大咒骂：

postcode_test['new_pstl_cd'] = postcode_test['new_pstl_cd']\
    .mask(postcode_len_6, lambda s: s.str[:3]+' '+s.str[3:])

postcode_test['new_pstl_cd'] = postcode_test['new_pstl_cd']\
    .mask(postcode_len_5, lambda s: s.str[:2]+' '+s.str[2:])

映射一个普通的Python函数

另一种解决方案是编写一个仅对字符串进行操作的“标量值” Python函数。

def make_new_postcode(p):
    if len(p) == 5:
        p = p[:2] + ' ' + p[2:]
    elif len(p) == 6:
        p = p[:3] + ' ' + p[3:]
    return p

postcode_test['new_pstl_cd'] = postcode_test['new_pstl_cd'].map(make_new_postcode)

应用普通的Python函数

如果您确实想重用现有的"length"列，也可以按行应用函数：

def make_new_postcode(row):
    l= row['length']
    p= row['new_pstl_cd']
    if l == 5:
        p = p[:2] + ' ' + p[2:]
    elif l == 6:
        p = p[:3] + ' ' + p[3:]
    return p

postcode_test['new_pstl_cd'] = postcode_test[['length', 'new_pstl_cd']].apply(make_new_postcode, axis=1, result_type='reduce')

有没有办法使用熊猫根据长度在字符串的字符之间插入空格？

2 个答案:

问题

可能的解决方案

布尔子集

使用“遮罩”方法

映射一个普通的Python函数

应用普通的Python函数