鉴于此数据框:
import pandas as pd
df = pd.DataFrame({'ID':['a','b','c','d','e','f','g','h','i','j','k'],
'value':['None',np.nan,'6D','7','10D','NONE','x','10D aaa','1 D','10 D aa',7]
})
df
ID value
0 a None
1 b NaN
2 c 6D
3 d 7
4 e 10D
5 f NONE
6 g x
7 h 10D aaa
8 i 1 D
9 j 10 D aa
10 k i7D
我想提取存在的数字,否则返回0,如上所示的任何乱七八糟的情况。
期望的结果是:
ID value
0 a 0
1 b 0
2 c 6
3 d 7
4 e 10
5 f 0
6 g 0
7 h 10
8 i 1
9 j 10
10 k 7
提前致谢!
答案 0 :(得分:1)
或者,您可以在applymap()
捕获多个异常后,通过EAFP
principle将数据应用于数据框,同时提取数字:
sudo service sphinxsearch restart
打印:
def get_number(item):
try:
return int(re.search(r"\d+", str(item)).group(0))
except (AttributeError, ValueError, IndexError):
return 0
print(df.applymap(get_number))
答案 1 :(得分:1)
import pandas as pd
df = pd.DataFrame({'ID':['a','b','c','d','e','f','g','h','i','j','k'],
'value':['None',np.nan,'6D','7','10D','NONE','x','10D aaa','1 D','10 D aa',7]
})
df = df.fillna(0)
df = df.str.replace(r'\D+', '').astype(int)
答案 2 :(得分:1)
以下是我使用re.findall
和apply
df['value'].apply(lambda x: 0 if not re.findall('\d+', str(x)) else re.findall('\d+', str(x))[0])