使用正则表达式熊猫改变年龄

时间:2019-08-22 21:04:54

标签: python regex string pandas replace

Traceback (most recent call last):
    File ".\api.py", line 12, in <module>
      'fb570dcb58ade2614a00539e355fbbb33325e55510d47e8bc8ca10f11033b868'
    File "C:\Users\EstDorisMaribelMarca\AppData\Local\Programs\Python\Python37\lib\site-packages\pythonzimbra\tools\auth.py", line 104, in authenticate
      server.send_request(auth_request, response)
    File "C:\Users\EstDorisMaribelMarca\AppData\Local\Programs\Python\Python37\lib\site-packages\pythonzimbra\communication.py", line 125, in
  send_request
      self.timeout
    File "C:\Users\EstDorisMaribelMarca\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 222, in urlopen
      return opener.open(url, data, timeout)
    File "C:\Users\EstDorisMaribelMarca\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 525, in open
      response = self._open(req, data)
    File "C:\Users\EstDorisMaribelMarca\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 543, in _open
      '_open', req)
    File "C:\Users\EstDorisMaribelMarca\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 503, in _call_chain
      result = func(*args)
    File "C:\Users\EstDorisMaribelMarca\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 1360, in https_open
      context=self._context, check_hostname=self._check_hostname)
    File "C:\Users\EstDorisMaribelMarca\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 1319, in do_open
      raise URLError(err)
  urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1056)>

我正在尝试使用正则表达式来更改所有90岁以上的年龄。因此, import pandas as pd dataframe = pd.DataFrame({'Data' : ['A 90-year-old or 96-year-old and 110-year-old is 90 days ', 'For all 82-year-old is the 94-year-old why 28A ', 'But the fact is 101-year-old 109-year-old cool 100',], 'ID': [1,2,3] }) #tried this regex dataframe['New'] = dataframe['Data'].str.replace(r'\d+(-year-old)', r'>90') dataframe Data ID New 0 A 90-year-old or 96-year-old and 110-year-old is 90 days 1 A >90 or >90 and >90 is 90 days 1 For all 82-year-old is the 94-year-old why 28A 2 For all >90 is the >90 why 28A 2 But the fact is 101-year-old 109-year-old cool 100 3 But the fact is >90 >90 cool 100 将更改为90-year-old作为示例。但是>90或任何90岁以下的年龄都不应该。我已经接近我想要的状态,但是82-year-old仍然更改为82-year-old,但不应该

如何在此代码行中更改正则表达式

>90

,以便仅 dataframe['New'] = dataframe['Data'].str.replace(r'\d+(-year-old)', r'>90') 及更高版本(例如90-year-old91-year-old98-year-old等)更改为105-year-old

1 个答案:

答案 0 :(得分:0)

您可以使用涵盖两种情况的正则表达式指定此名称:9[1-9]\d{3,}

dataframe['New'] = dataframe['Data'].str.replace(r'(9[1-9]|\d{3,})(-year-old)', r'>90')

因此,第一部分9[1-9]匹配9199之间的所有值,第二部分匹配所有三位数或更多的数字(1234当然是非常不太可能)。

对于给定的样本数据,我们获得:

>>> dataframe['Data'].str.replace(r'(9[1-9]|\d{3,})(-year-old)', r'>90')
0    A 90-year-old or >90 and >90 is 90 days  
1      For all 82-year-old is the >90 why 28A 
2             But the fact is >90 >90 cool 100
Name: Data, dtype: object

如果要包含90,可以将正则表达式更改为:

>>> dataframe['Data'].str.replace(r'(9\d|\d{3,})(-year-old)', r'>90')
0          A >90 or >90 and >90 is 90 days  
1    For all 82-year-old is the >90 why 28A 
2           But the fact is >90 >90 cool 100
Name: Data, dtype: object
相关问题