Question

我目前正在根据第一列中的字符重新排列数据框。我使用了下面的函数来重新排列数据。

df['RegionName'] = df.loc[df.text.str.contains('(', regex=False), 'text'].str.extract(r'(.*?)\s*[\(\[]+.*[\n]*', expand=False)

我遇到的问题是，在完成初始重新排列后，最后一步需要选择剩余的数据。我相信我需要一个if else声明，否则其他人将允许我完成最后一步。在我的尝试中，我不断得到一个错误，我的布尔语句是模糊的。如何在if else语句中使用上面的代码来完成我的任务？

谢谢！

Answer 1

似乎你需要：

#if need only values where mask is True, else get NaNs
mask = df.text.str.contains('(', regex=False)
df.loc[mask, 'RegionName'] = df.loc[mask, 'text'].str.extract(r'(.*?)\s*[\(\[]+.*[\n]*', 
                                                               expand=False)

或者：

#if need processes values only where mask is True, else get original data
mask = df.text.str.contains('(', regex=False)
df['RegionName'] = df['text'].mask(mask, df['text'].str.extract(r'(.*?)\s*[\(\[]+.*[\n]*', 
                                                             expand=False))

或者：

#if need processes values only if mask is True, else get another value like aaa or df['col']
mask = df.text.str.contains('(', regex=False)
df['RegionName']=np.where(mask,df['text'].str.extract(r'(.*?)\s*[\(\[]+.*[\n]*',expand=0),
                              'aaa')

为了更好地理解：

df = pd.DataFrame({'text':[' (1', '(', '4', '[7', '{8', '{7', ' [1']})
print (df)
  text
0   (1
1    (
2    4
3   [7
4   {8
5   {7
6   [1

mask1 = df.text.str.contains('(', regex=False)
mask2 = df.text.str.contains('{', regex=False)
mask3 = df.text.str.contains('[', regex=False)

df['d'] = np.where(mask1, 1, 
          np.where(mask2, 3,
          np.where(mask3, 2, 4)))
print (df)
  text  d
0   (1  1
1    (  1
2    4  4
3   [7  2
4   {8  3
5   {7  3
6   [1  2

另一个更复杂的样本：

df = pd.DataFrame({'text':[' (1', '(', '4', '[ur', '{dFd', '{fGf', ' [io']})
print (df)

mask1 = df.text.str.contains('(', regex=False)
mask2 = df.text.str.contains('{', regex=False)
mask3 = df.text.str.contains('[', regex=False)

df['parsed'] = np.where(mask1, df.text.str.extract(r'(\d+)', expand=False), 
               np.where(mask2, df.text.str.extract(r'([A-Z]+)', expand=False),
               np.where(mask3, df.text.str.extract('([uo])+', expand=False), 4)))
print (df)

   text parsed
0    (1      1
1     (    NaN
2     4      4
3   [ur      u
4  {dFd      F
5  {fGf      G
6   [io      o

使用If / else构造数据帧

1 个答案: