Question

我一直在我的本地工作Jupyter笔记本，并且最近已将其迁移到位于远程服务器上的Ubuntu Linux VM。我已成功在远程服务器上安装Python和Jupyter，并且正在访问它via SSH portforwarding。我已经能够成功运行其他脚本和文件，但脚本已经以一种非常奇怪的方式停止。

我有一个大型的pandas DataFrame（40k行x 60列），我正在预处理以准备并行操作，因此我使用pandas .groupby切割DataFrame然后使用itertools.product将每个切片与分类变量配对，下面使用模拟示例

import itertools as itt
import pandas as pd
import numpy as np

# Mocked Data
alphabet = 'abcdefghijklmnopqrstuvwxyz'
dates = [pd.to_datetime('{}-{}-1'.format(y,m+1)) + pd.tseries.offsets.MonthEnd(0) for y, m in itt.product(
    range(2007, 2018),
    range(0,12)
)]
dataframe = pd.DataFrame(
    index=range(0,40000),
    columns=[''.join(t) for t in list(itt.combinations(alphabet,2))[0:60]])
dataframe['ab'] = np.random.choice(dates, 40000, True)
dataframe['ac'] = np.random.choice(list(alphabet[0:6]), 40000, True)

numeric_cols = dataframe.columns[~dataframe.columns.isin(['ab', 'ac'])]
dataframe.loc[:, numeric_cols] = np.random.random((len(dataframe.index), len(numeric_cols)))

FACTORS = set(alphabet[0:15])

# Halts here! Does not finish, but no errors.
preproc = list(itt.product(
    FACTORS,
    dataframe.groupby(['ab', 'ac'])
))

不知何故，当我在Jupyter中将其作为单个单元格执行时，它根本无法完成，并且只是在操作中停止。系统监视器显示使用100％CPU的特定Python进程，但未完成。

我不认为这是因为我没有足够的处理能力/内存或代码问题，因为我已经能够在我的本地机器上以Jupyter运行此操作，以及作为python（即.py文件）在VM上具有相同的内核。

感谢任何帮助/指示！

远程机器的规格和包的相关版本如下：

OS:
Linux cooVM 4.10.0-42-generic #46~16.04.1 -Ubuntu SMP x86_64 GNU/Linux

Hardware:
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
CPU(s):                8
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 44
Model name:            Intel(R) Xeon(R) CPU           E5620  @ 2.40GHz
Stepping:              2
CPU MHz:               2400.085
Hypervisor vendor:     VMware
RAM:                   16GB

Python 3.5.2

Pip包：

Package                       Version  
----------------------------- ---------  
ipykernel                     4.7.0    
ipython                       6.2.1    
ipython-genutils              0.2.0    
ipywidgets                    7.1.0     
jupyter                       1.0.0    
jupyter-client                5.2.0    
jupyter-console               5.2.0    
jupyter-core                  4.4.0        
notebook                      5.2.2    
numpy                         1.13.3   
pandas                        0.22.0     
pip                           9.0.1       
python-apt                    1.1.0b1  
python-dateutil               2.6.1    
python-debian                 0.1.27   
python-systemd                231       
widgetsnbextension            3.1.0

Answer 1

这是Jupyter的一个持续存在的问题，仅在Windows计算机上发生。 Github问题在这里跟踪：

https://github.com/jupyter/notebook/issues/714

Jupyter内核和Python之间的不同行为

1 个答案: