Question

我有一个函数，用于绘制Pandas DataFrame中两列的日志。因为这样的零会导致错误并需要删除。目前，函数的输入是DataFrame的两列。有没有办法删除任何包含零的行？例如，等效版本的df = df [df.ColA！= 0]

def logscatfit(x,y,title):
    xvals2 = np.arange(-2,6,1)
    a = np.log(x) #These are what I want to remove the zeros from
    b = np.log(y)
    plt.scatter(a, b, c='g', marker='x', s=35)
    slope, intercept, r_value, p_value, std_err = stats.linregress(a,b)
    plt.plot(xvals2, (xvals2*slope + intercept), color='red')
    plt.title(title)
    plt.show()
    print "Slope is:",slope, ". Intercept is:",intercept,". R-value is:",r_value,". P-value is:",p_value,". Std_err is:",std_err

在a和b中无法想到删除零的方法，但保持它们的长度相同，以便我可以绘制散点图。我唯一的选择是重写函数以获取DataFrame，然后使用df1 = df[df.ColA != 0]然后df2 = df1[df1.ColB != 0]删除零？

Answer 1

根据我的理解，您需要删除（和/或）x或y为零的行。

一种简单的方法是

keepThese = (x > 0) & (y > 0)
a = x[keepThese]
b = y[keepThese]

然后继续使用您的代码。

Answer 2

我喜欢FooBar的简单回答。更通用的方法是将数据帧传递给您的函数并使用.any()方法。

def logscatfit(df,x_col_name,y_col_name,title):
    two_cols = df[[x_col_name,y_col_name]]
    mask = two_cols.apply(lambda x: ( x==0 ).any(), axis = 1)
    df_to_use = df[mask]
    x = df_to_use[x_col_name]
    y = df_to_use[y_col_name]

    #your code
    a = n.log(x)
    etc

Answer 3

将FooBar的答案插入到您的函数中会给出：

def logscatfit(x,y,title):
    xvals2 = np.arange(-2,6,1)
    keepThese = (x > 0) & (y > 0)
    a = x[keepThese]
    b = y[keepTheese]        
    a = np.log(a)
    b = np.log(b)
    plt.scatter(a, b, c='g', marker='x', s=35)
    slope, intercept, r_value, p_value, std_err = stats.linregress(a,b)
    plt.plot(xvals2, (xvals2*slope + intercept), color='red')
    plt.title(title)
    plt.show()
    print "Slope is:",slope, ". Intercept is:",intercept,". R-value is:",r_value,". P-value is:",p_value,". Std_err is:",std_err

删除两个Pandas系列中包含零的整个行

3 个答案: