更新

Question

更新

我为 docker-container 运行了 jupyter-notebook，但是在运行基于 pandas 的块时，几秒钟后系统返回：

<块引用>

内核重启：.ipynb 的内核似乎已经死了。会自动重启。

只有重启内核的选项。

这是出现消息的代码块：

import pandas as pd


def remove_typos(string):
    
    string=str(string)
    string=str(string).replace('≤', '')
    string=str(string).replace('+', '')
    
    # if "%" detected then convert to numeric format
    if "%" in string: 
        string=string.replace('%', '')
        string=float(string)/100
        
    else:
        pass
        
    return string


data = {k: v.replace([r'\+', '≤'], '', regex=True) for k, v in data.items()}
data = {k: v.applymap(remove_typos) for k, v in data.items()}

我已经尝试过什么？

我已经尝试在容器 cli 中运行 pip install pandas：这会返回下一条消息：

尝试为容器提供更多本地内存：

尝试从 anaconda 提示符更新 conda 并重新安装所有软件包：

# conda config --set quiet True
# conda update --force conda

#conda install pandas

在所有情况下，结果都是一样的。

附加说明：

总处理器利用率达到 100%
函数应用于 10,000 多个单元格

是否有其他方法可以解决这个问题？

数据演示

原始 df 保持相同的格式，但尺寸要大得多。

data = {'dataframe_1':pd.DataFrame({'col1': ['John', 'Ashley'], 'col2': ['+10', '-1']}), 'dataframe_2':pd.DataFrame({'col3': ['Italy', 'Brazil', 'Japan'], 'col4': ['Milan', 'Rio do Jaineiro', 'Tokio'], 'percentage':['+95%', '≤0%', '80%+']})}

会话信息

{'commit_hash': '2486838d9',
 'commit_source': 'installation',
 'default_encoding': 'UTF-8',
 'ipython_path': '/usr/local/lib/python3.6/site-packages/IPython',
 'ipython_version': '7.16.1',
 'os_name': 'posix',
 'platform': 'Linux-5.10.25-linuxkit-x86_64-with-debian-10.9',
 'sys_executable': '/usr/local/bin/python',
 'sys_platform': 'linux',
 'sys_version': '3.6.13 (default, May 12 2021, 16:40:31) \n[GCC 8.3.0]'}

Answer 1

问题与迭代次数有关，需要减少迭代次数。

首先，将函数重命名为 convert_to_percentage()，然后迭代每个键和值以替换字符：


############# convert_to_percentage(string) #################

# string :: strings which represent a percentage.

def convert_to_percentage(string):
    
    #string=str(string).replace([r'\+', '≤'], '', regex=True)

    # if "%" detected then convert to numeric format
    if "%" in string: 
        string=str(string)
        string=string.replace('%', '')
        string=float(string)/100
        
    else:
        pass
        
    
    
    return string

############################################################
#                                                          #
# removin typos for each string and converting to float    #
#                                                          #
############################################################

######## removing trailing whitespaces and typos ###########

# for all job title reports

data= {k: v.replace([r'\+', '≤'], '', regex=True) for k, v in data.items()}



print('Succesful removing of typos!')

其次，将for key, value in data.items()替换为for key in data:

############################################################
#                                                          #
# conversion of specific columns to percentages (%)        #
#                                                          #
############################################################

for key in data:
     data[key].apply(lambda x: convert_to_percentage(x), axis=1)

pandas-内核重新启动：.ipynb 的内核似乎已经死了。它会自动重启

更新

数据演示

会话信息

1 个答案: