R中的平行坐标图(ggparcoord)

时间:2014-04-09 04:41:20

标签: r plot ggplot2 ggally

在使用ggparcoord绘制平行坐标图时,我面临一种有些奇怪的情况。我正在运行以下代码,它运行得非常好:

# Load required packages
require(GGally)

# Load datasets
data(state)
df <- data.frame(state.x77,
                 State = state.name,
                 Abbrev = state.abb,
                 Region = state.region,
                 Division = state.division
) 

# Generate basic parallel coordinate plot
p <- ggparcoord(data = df,                 
                # Which columns to use in the plot
                columns = 1:4,                 
                # Which column to use for coloring data
                groupColumn = 11,                 
                # Allows order of vertical bars to be modified
                order = "anyClass",                
                # Do not show points
                showPoints = FALSE,                
                # Turn on alpha blending for dense plots
                alphaLines = 0.6,                
                # Turn off box shading range
                shadeBox = NULL,                
                # Will normalize each column's values to [0, 1]
                scale = "uniminmax" # try "std" also
)

# Start with a basic theme
p <- p + theme_minimal()

# Decrease amount of margin around x, y values
p <- p + scale_y_continuous(expand = c(0.02, 0.02))
p <- p + scale_x_discrete(expand = c(0.02, 0.02))

# Remove axis ticks and labels
p <- p + theme(axis.ticks = element_blank())
p <- p + theme(axis.title = element_blank())
p <- p + theme(axis.text.y = element_blank())

# Clear axis lines
p <- p + theme(panel.grid.minor = element_blank())
p <- p + theme(panel.grid.major.y = element_blank())

# Darken vertical lines
p <- p + theme(panel.grid.major.x = element_line(color = "#bbbbbb"))

# Move label to bottom
p <- p + theme(legend.position = "bottom")

# Figure out y-axis range after GGally scales the data
min_y <- min(p$data$value)
max_y <- max(p$data$value)
pad_y <- (max_y - min_y) * 0.1

# Calculate label positions for each veritcal bar
lab_x <- rep(1:4, times = 2) # 2 times, 1 for min 1 for max
lab_y <- rep(c(min_y - pad_y, max_y + pad_y), each = 4)

# Get min and max values from original dataset
lab_z <- c(sapply(df[, 1:4], min), sapply(df[, 1:4], max))

# Convert to character for use as labels
lab_z <- as.character(lab_z)

# Add labels to plot
p <- p + annotate("text", x = lab_x, y = lab_y, label = lab_z, size = 3)

# Display parallel coordinate plot
print(p)

我得到以下输出:

enter image description here

我希望使用以下语句对数据进行子集化以显示更少的region级别:

df<-df[which(df$Region %in% c('South','West','Northeast')),]

我开始收到以下错误:

Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
  contrasts can be applied only to factors with 2 or more levels

当我想要显示的关卡数量明显超过2时,为什么会出现此错误? 对此有任何帮助将非常感激。

1 个答案:

答案 0 :(得分:2)

我想出了问题所在。我不得不将列转换为因子。

df$Region <- factor(df$Region)

上面的代码修复了错误。