我在pandas数据框中有一列字符串,其中包含诸如"AU/4347001"
之类的内容,但除此之外,还有一些组织得较不完整的字符串,例如"Who would have thought this would be so 4347009 difficult"
因此,最终,这一系列数字在字符串中出现的位置和方式没有一致的模式。它们可能在开头,中间或结尾,并且无法确切知道数字周围还有多少其他字符。
理想情况下,我想返回另一列等长的列,只包含数字。
这可能吗?
非常感谢您的帮助!
谢谢
答案 0 :(得分:1)
您可以将extract与数字(\d+)
的捕获组一起使用:
import pandas as pd
data = ["AU/4347001",
"Who would have thought this would be so 4347009 difficult",
"Another with a no numbers",
"131242143"]
df = pd.DataFrame(data=data, columns=['txt'])
result = df.assign(res=df.txt.str.extract('(\d+)')).fillna('')
print(result)
输出
txt res
0 AU/4347001 4347001
1 Who would have thought this would be so 434700... 4347009
2 Another with a no numbers
3 131242143 131242143
请注意,在上面的示例中,使用fillna来填充没有找到数字组的那些列(在这种情况下,是用空字符串填充)。
答案 1 :(得分:1)
您可以执行extract
:
df =pd.DataFrame({'text':["Who would have thought this would be so 4347009 difficult",
"24 is me"]})
df['new_col'] = df['text'].str.extract(r'(\d+)')
text new_col
0 Who would have thought this would be so 434700... 4347009
1 24 is me 24
答案 2 :(得分:1)
这是我们的测试数据框:
### Create an example Pandas Dataframe
df = pd.DataFrame(data=['something123', 'some456thing', '789somthing',
'Lots of numbers 82849585 make a long sentence'], columns = ['strings'])
### Create a function for identifying, joining and then turning the string to an integer
def get_numbers(string):
return int(''.join([s for s in string if s.isdigit()]))
### Now lets apply the get_numbers function to the strings column
df.loc[:,'strings_wo_numbers'] = df.loc[:,'strings']apply(get_numbers)
注意:这将连接字符串中的所有数字,即“ 10个橄榄和5个苹果”将变成105个而不是10个,5个。
答案 3 :(得分:0)
使用str.finall
df.text.str.findall('\d+').str[0]
0 4347009
1 24
Name: text, dtype: object