Question

我在pandas数据框中有一列字符串，其中包含诸如"AU/4347001"之类的内容，但除此之外，还有一些组织得较不完整的字符串，例如"Who would have thought this would be so 4347009 difficult"

因此，最终，这一系列数字在字符串中出现的位置和方式没有一致的模式。它们可能在开头，中间或结尾，并且无法确切知道数字周围还有多少其他字符。

理想情况下，我想返回另一列等长的列，只包含数字。

这可能吗？

非常感谢您的帮助！

谢谢

Answer 1

您可以将extract与数字(\d+)的捕获组一起使用：

import pandas as pd

data = ["AU/4347001",
        "Who would have thought this would be so 4347009 difficult",
        "Another with a no numbers",
        "131242143"]

df = pd.DataFrame(data=data, columns=['txt'])
result = df.assign(res=df.txt.str.extract('(\d+)')).fillna('')
print(result)

输出

                                                 txt        res
0                                         AU/4347001    4347001
1  Who would have thought this would be so 434700...    4347009
2                          Another with a no numbers           
3                                          131242143  131242143

请注意，在上面的示例中，使用fillna来填充没有找到数字组的那些列（在这种情况下，是用空字符串填充）。

Answer 2

您可以执行extract：

df =pd.DataFrame({'text':["Who would have thought this would be so 4347009 difficult",
                          "24 is me"]})

df['new_col'] = df['text'].str.extract(r'(\d+)')

    text                                                new_col
0   Who would have thought this would be so 434700...   4347009
1   24 is me                                            24

Answer 3

这是我们的测试数据框：

### Create an example Pandas Dataframe
df = pd.DataFrame(data=['something123', 'some456thing', '789somthing', 
                        'Lots of numbers 82849585 make a long sentence'], columns = ['strings'])

### Create a function for identifying, joining and then turning the string to an integer
def get_numbers(string):
    return int(''.join([s for s in string if s.isdigit()]))

### Now lets apply the get_numbers function to the strings column
df.loc[:,'strings_wo_numbers'] = df.loc[:,'strings']apply(get_numbers)

注意：这将连接字符串中的所有数字，即“ 10个橄榄和5个苹果”将变成105个而不是10个，5个。

Answer 4

使用str.finall

df.text.str.findall('\d+').str[0]
0    4347009
1         24
Name: text, dtype: object

从列中的字符串中提取一组n个数字

4 个答案: