Question

我有5个变量（年龄，日期，年龄，女性，单身汉），并希望通过“女性”栏目分割数据。女性的值为1，男性的值为0。我理解函数split()可以使用代码为我分割：

split(data_wage$ahe, data_wage$female)

但我不明白的是在这部分完成后如何使用这两个拆分组。我想绘制一个'年龄'的散点图。在＆＃39; ahe＆＃39;一次与女性一次，一次与男性一次。任何帮助将不胜感激！

Answer 1

对于像这样的问题，可以避免

split，特别是如果你使用＆＃34; lattice＆＃34;等工具。或＆＃34; ggplot2＆＃34;。

这是一种基于格子＆＃34;的方法：

## sample data
set.seed(1)
mydf <- data.frame(
  ahe = sample(100, 1000, TRUE),
  age = sample(18:60, 1000, TRUE),
  female = sample(c(0, 1), 1000, TRUE)
)

## Convert the female column to a factor
## Not necessary, but makes the output nicer
mydf$female <- factor(mydf$female, c(0, 1), c("male", "female"))

## Load the lattice package
library(lattice)

## Side by side
xyplot(ahe ~ age | female, data = mydf)

enter image description here

## all in one, with key
xyplot(ahe ~ age, groups = female, data = mydf, auto.key = TRUE)

enter image description here

Answer 2

split（）返回一个列表，在本例中是一个包含两个data.frames的列表，一个用于Male，另一个用于Female。

lapply（list，function）会将一个函数应用于列表的每个元素，因此，请考虑以下代码：

splitList = split(data_wage, data_wage$female)
par(mfrow=c(1,2))
lapply(splitList,function(x){plot(age~ahe,data=x)})

这将为您提供两个散点图，一个是男性，另一个是女性。

Answer 3

使用＆＃34;子集＆＃34;创建一个只包含所需记录的新数据框。使用逻辑运算符（例如data_wage$female==1：

）指定此项

`data_wage_female <- subset(data_wage, data_wage$female==1)

data_wage_male <- subset(data_wage, data_wage$female==0)

     ## now you can plot females and males separately using these subsets:
plot(data_wage_female$age ~ data_wage_female$ahe, col="red")  

     ## plots females with red symbols
points(data_wage_male$age ~ data_wage_male$ahe, col="blue")
 ## plots males with blue symbols on the same scatter plot'

如何根据R中的变量拆分数据框

3 个答案: