Question

我使用 lattice 包中的histogram来绘制两个直方图条件，对两个选项进行条件调整：男性或女性。

histogram(~ raw$Housework_Tot_Min [(raw$Housework_Tot_Min != 0) & 
(raw$Housework_Tot_Min < 1000)] | raw$Gender)

Output of code: two histograms, minutes doing housework by gender

但是，当我实际查看数据时，这些直方图是不正确的。通过绘图：

histogram(~ raw$Housework_Tot_Min [(raw$Housework_Tot_Min != 0) & 
(raw$Housework_Tot_Min < 1000) & (raw$Gender == "Female")]

和

histogram(~ raw$Housework_Tot_Min [(raw$Housework_Tot_Min != 0) & 
(raw$Housework_Tot_Min < 1000) & (raw$Gender == "Male")]

I get two histograms again, but they look very different

有没有人知道为什么这些输出不匹配？我有一堆二进制类型的面板要绘制，并且必须单独执行它们实际上违背了使用 lattice 包的目的！

如果这掩盖了对一个简单概念的基本误解，我很抱歉，我仍然是R的初学者！非常感谢你的帮助。

Answer 1

问题与panel.args.common中的不同值有关（即，所有面板函数共有的参数，请参阅?trellis.object）。以下是一些示例代码，以澄清我的观点。

library(lattice)

## paneled plot
hist1 <- histogram( ~ Sepal.Width | Species, data = iris)
hist1$panel.args.common

# $breaks
# [1] 1.904 2.228 2.552 2.876 3.200 3.524 3.848 4.172 4.496
# 
# $type
# [1] "percent"
#
# $equal.widths
# [1] TRUE
# 
# $nint
# [1] 8

## single plot    
hist2 <- histogram( ~ Sepal.Width, data = iris[iris$Species == "setosa", ])
hist2$panel.args.common

# $breaks
# [1] 2.216 2.540 2.864 3.188 3.512 3.836 4.160 4.484
# 
# $type
# [1] "percent"
# 
# $equal.widths
# [1] TRUE
# 
# $nint
# [1] 7

nint（直方图区间数量，请参阅?histogram）和breaks（区间的断点）在所有目标面板中计算，因此在hist1和hist2之间变化hist2$panel.args.common <- hist1$panel.args.common ## or vice versa, depending on the number of bins and breakpoints to use library(gridExtra) grid.arrange(hist1, hist2, ncol = 2)。如果您希望这些参数相同，以使两个图看起来相似，则只需在创建两个图之后运行以下代码行。

SELECT $match->match_id

Answer 2

事实证明，问题是基于使用括号应用的排除项的数据不匹配。而不是：

histogram(~ raw$Housework_Tot_Min [(raw$Housework_Tot_Min != 0) & 
(raw$Housework_Tot_Min < 1000)] | raw$Gender)

应该是：

histogram(~ Housework_Tot_Min [(Housework_Tot_Min != 0) & (Housework_Tot_Min < 1000)] | 
        Gender [(Housework_Tot_Min != 0) & (Housework_Tot_Min < 1000)], data = raw,
      main = "Time Observed Housework by Gender",
      xlab = "Minutes spent",
      breaks = seq(from = 0, to = 400, by = 20))

请注意，排除现在适用于家务时间和性别变量，从而消除了数据中的不匹配。

下面粘贴了正确的情节。再次感谢大家的指导。

Updated Histogram

使用莱迪思包的条件直方图，输出图不正确

2 个答案: