Question

我需要从一系列字符串中提取日期：

'MIHAI MĂD2Ă3.07.1958'

或

'CLAUDIU-MIHAI17.12.1999'

该怎么做？

尝试过：

for index,row in DF.iterrows():
    try:
        if math.isnan(row['Data_Nasterii']):
            match = re.search(r'\d{2}.\d{2}.\d{4}', row['Prenume'])
            date = datetime.strptime(match.group(), '%d.%m.%Y').date()
            s = datetime.strftime(datetime.strptime(str(date), '%Y-%m-%d'), '%d-%m-%Y')
            row['Data_Nasterii'] = s
    except TypeError:
        pass

Answer 1

正则表达式中的.（点）不表示字符点，而是“任何内容”，需要转义（\）才能成为实际的点。
除了第一个组是\d{2}以外，但您的某些日期是个位数。
我将使用以下内容：

re.search(r'(\d+\.\d+\.\d+)', row['Prenume'])

表示至少一个数字后跟一个点，然后至少一个数字.....
如果您当天遇到一些混合字符，则可以尝试以下（低于标准的）解决方案：

''.join(re.search(r'(\d*)(?:[^0-9\.]*)(\d*\.\d+\.\d+)', row['Prenume']).groups())

这将在您的“一天”中最多过滤出一个区块，虽然效果不佳，但可以工作（并返回一个字符串）

Answer 2

您可以将insertDocument() .then(() => { // do something // returning a string rather than a promise return 'hello'; }) .then(result => { // The promise will put the result of the previous function in here // even though it wasn't a promise console.log(result); // hello }) .then(result => { // You can still call .then again, even though nothing was returned // from the previous block, but this time result will be undefined console.log(result); // undefined })访问器与正则表达式一起使用：

str

Answer 3

您需要将dot (.)换成\.，或者可以在字符类“ [.]”中使用它。它是正则表达式中的元字符，可以匹配任何字符。如果您需要验证更多you can refer this!

例如：r'[0-9]{2}[.][0-9]{2}[.][0-9]{4}' or r'\d{2}\.\d{2}\.\d{4}'

text = 'CLAUDIU-MIHAI17.12.1999'
pattern = r'\d{2}\.\d{2}\.\d{4}'

if re.search(pattern, text):
    print("yes")

Answer 4

另一个好的解决方案是使用dateutil.parser：

import pandas as pd
import dateutil.parser as dparser

df = pd.DataFrame({'A': ['MIHAI MĂD2Ă3.07.1958',
                         'CLAUDIU-MIHAI17.12.1999']})

df['userdate'] = df['A'].apply(lambda x: dparser.parse(x.encode('ascii',errors='ignore'),fuzzy=True))

输出

                       A    userdate
0   MIHAI MĂD2Ă3.07.1958    1958-07-23
1   CLAUDIU-MIHAI17.12.1999 1999-12-17

从包含名称和日期的字符串中提取日期

4 个答案: