Python在某些字符串后提取数字

时间:2019-11-21 07:31:59

标签: python regex pandas data-manipulation

我有一个如下所示的数据框

import pandas as pd
page = ['A','B','C','D']
URL = ['aaa.bbb3333.ccc.de12345.dddd.cccc','ccc2222.ddd.aaa.ho16589.ddd','ddd16893.aaa.de59875','aaa15875.ccc.ddd.ho13532']
df = pd.DataFrame({'page':page,'URL':URL})

我想创建一列来提取“ de”或“ ho”之后的数字。请注意,数字的长度可能不同,并且'de'或'ho'的位置也可能不同。

我的代码如下:

import re
def extract_number(df,url):
    for url in df:
        if df[url].str.contains('de', na = False) == True:
            match = re.search('de:P(\d+)')
        elif df[url].str.contains('ho', na = False) == True:
            match = re.search('ho:P(\d+)')
        else:
            match = 'not found'
        print(match)

out = extract_number(df, 'URL')

它返回错误“系列的真值不明确。使用a.empty,a.bool(),a.item(),a.any()或a.all()。'

所需的输出应如下所示:

import pandas as pd
page = ['A','B','C','D']
URL = ['aaa.bbb.ccc.de12345.dddd.cccc','ccc.ddd.aaa.ho16589.ddd','ddd.aaa.de59875','aaa.ccc.ddd.ho13532']
ID = ['12345','16589','59875','13532']
df = pd.DataFrame({'page':page,'URL':URL,'ID':ID})

百万感谢!!!

1 个答案:

答案 0 :(得分:2)

使用for i in range(N)并向后看:

/**
Creates a record to be sent to a specified topic and partition
**/
    public ProducerRecord(String topic, Integer partition, K key, V value) {
            this(topic, partition, null, key, value, null);
    }