Question

我有一个数据集，其中包含许多类别变量，我希望在散点图中绘制它们而不必对变量进行编码。这是我的尝试：

    fig = plt.figure(figsize=(18, 9))

    for column in df:

     if df[column].dtype != np.int64 and df[column].dtype != np.float64:

        ca = df.plot.scatter(x=df[column],y= df['log_prices'], ax = 

        fig.add_subplot(2,3,df[column]+1))

        plt.plot(df.iloc[:,df[column]].values, sm.OLS(df.iloc
        [:,df['log_prices'].values,sm.add_constant(df.iloc[:,df[column]].values)).fit().fittedvalues,'r-')

这是我目前遇到的错误：

     ----> 5             ca = df.plot.scatter(x=df[column],y=df['log_prices'], ax = fig.add_subplot(2,3,df_061[column]+1))

         cannot concatenate 'str' and 'int' objects

这显然与log_prices有关。

有没有更简单的方法？

谢谢

Answer 1

我建议进行以下更改：

ca = df.plot.scatter(x = column, y = 'log_prices', ax = fig.add_subplot(2, 3, df[column] + 1))

使用df.plot()方法时，需要为x和y参数提供列名，而不是实际数据。数据已经驻留在df中，因此您只需要为其提供要使用的列即可。

以下是使用示例数据对代码进行的部分复制：

import matplotlib.pyplot as plt

df = pd.DataFrame(np.random.rand(10,5), columns = ['A','B','C','D','E'])

fig = plt.figure(figsize=(18, 9))

idx = 0

for column in df:

#    if df[column].dtype != np.int64 and df[column].dtype != np.float64:
    idx += 1
    ca = df.plot.scatter(x = column, y = 'A', ax = fig.add_subplot(2,3,idx))

#    plt.plot(df.iloc[:,df[column]].values, sm.OLS(df.iloc[:,df['log_prices'].values,sm.add_constant(df.iloc[:,df[column]].values)).fit().fittedvalues,'r-')

此代码产生以下图：

Answer 2

from pandas.api.types import is_string_dtype
from pandas.api.types import is_numeric_dtype

categorical=[]
for column in df:
    if is_string_dtype(df[column]):
        categorical.append(column)


fig, ax = plt.subplots(2, 4, figsize=(20, 10))
for variable, subplot in zip(categorical, ax.flatten()):
    sns.countplot(df_2[variable], ax=subplot)
    for label in subplot.get_xticklabels():
        label.set_rotation(90)

Python循环仅绘制分类变量

2 个答案: