将传记数据转换为面板数据

时间:2013-12-17 03:27:31

标签: r panel-data

我有超过1600人的传记资料。这些数据包括他们的性别,出生年份,家乡等,以及从他们开始工作的那一年起的职业轨迹。我试图把它变成一个面板数据,这样我就可以了解自从他们开始工作以来他们的工作场所发生了怎样的变化。我对此数据集存在以下问题:

1)如何将其转换为面板数据集?我想要的每个人(id)的最佳格式是:

  id gender hometown year job
1  1      1       NY 1990   3
1  1      1       NY 1991   3
1  1      1       NY 1992   3
1  1      1       NY 1993   3
1  1      1       NY 1994   5

2)如果此人有重叠职位,我该如何保存信息?例如,该人可以同时拥有工作3和工作5。我希望以后只能使用比另一个更高的工作,但同时我想尽可能多地保存信息。

1 个答案:

答案 0 :(得分:1)

好的,试一试。

首先选择数据的子集。

> (D = head(origin[, c("id", "name1", "gender", "job1", "job1s", "job1e",
            "job2", "job10")]))
  id                name1 gender job1 job1s job1e job2 job10
1  1 Abulaiti Abureduxiti      1 2305  1980  1991 2303    NA
2  2  Aisihaiti Kelimubai      1 2307  1972  1987 2307    NA
3  3          Ai Zhisheng      1 4509  1996  1997 1075 10103
4  4         An Pingsheng      1 3555  1975  1977 3561  2191
5  5            An Zhiwen      1 2063  1977  1979 1127  2507
6  6             An Ziwen      1 4509  1954  1966 4007  2517

接下来,我们将数据重新组织成我认为您所遵循的格式。

> library(reshape2)
> (D = melt(D, id.vars = c("id", "name1", "gender")))
   id                name1 gender variable value
1   1 Abulaiti Abureduxiti      1     job1  2305
2   2  Aisihaiti Kelimubai      1     job1  2307
3   3          Ai Zhisheng      1     job1  4509
4   4         An Pingsheng      1     job1  3555
5   5            An Zhiwen      1     job1  2063
6   6             An Ziwen      1     job1  4509
7   1 Abulaiti Abureduxiti      1    job1s  1980
8   2  Aisihaiti Kelimubai      1    job1s  1972
9   3          Ai Zhisheng      1    job1s  1996
10  4         An Pingsheng      1    job1s  1975
11  5            An Zhiwen      1    job1s  1977
12  6             An Ziwen      1    job1s  1954
13  1 Abulaiti Abureduxiti      1    job1e  1991
14  2  Aisihaiti Kelimubai      1    job1e  1987
15  3          Ai Zhisheng      1    job1e  1997
16  4         An Pingsheng      1    job1e  1977
17  5            An Zhiwen      1    job1e  1979
18  6             An Ziwen      1    job1e  1966
19  1 Abulaiti Abureduxiti      1     job2  2303
20  2  Aisihaiti Kelimubai      1     job2  2307
21  3          Ai Zhisheng      1     job2  1075
22  4         An Pingsheng      1     job2  3561
23  5            An Zhiwen      1     job2  1127
24  6             An Ziwen      1     job2  4007
25  1 Abulaiti Abureduxiti      1    job10    NA
26  2  Aisihaiti Kelimubai      1    job10    NA
27  3          Ai Zhisheng      1    job10 10103
28  4         An Pingsheng      1    job10  2191
29  5            An Zhiwen      1    job10  2507
30  6             An Ziwen      1    job10  2517

我们可以看到,这些记录中的一些记录的作业字段为空,因此我们将其排除在外。

> (D = D[complete.cases(D),])
   id                name1 gender variable value
1   1 Abulaiti Abureduxiti      1     job1  2305
2   2  Aisihaiti Kelimubai      1     job1  2307
3   3          Ai Zhisheng      1     job1  4509
4   4         An Pingsheng      1     job1  3555
5   5            An Zhiwen      1     job1  2063
6   6             An Ziwen      1     job1  4509
7   1 Abulaiti Abureduxiti      1    job1s  1980
8   2  Aisihaiti Kelimubai      1    job1s  1972
9   3          Ai Zhisheng      1    job1s  1996
10  4         An Pingsheng      1    job1s  1975
11  5            An Zhiwen      1    job1s  1977
12  6             An Ziwen      1    job1s  1954
13  1 Abulaiti Abureduxiti      1    job1e  1991
14  2  Aisihaiti Kelimubai      1    job1e  1987
15  3          Ai Zhisheng      1    job1e  1997
16  4         An Pingsheng      1    job1e  1977
17  5            An Zhiwen      1    job1e  1979
18  6             An Ziwen      1    job1e  1966
19  1 Abulaiti Abureduxiti      1     job2  2303
20  2  Aisihaiti Kelimubai      1     job2  2307
21  3          Ai Zhisheng      1     job2  1075
22  4         An Pingsheng      1     job2  3561
23  5            An Zhiwen      1     job2  1127
24  6             An Ziwen      1     job2  4007
27  3          Ai Zhisheng      1    job10 10103
28  4         An Pingsheng      1    job10  2191
29  5            An Zhiwen      1    job10  2507
30  6             An Ziwen      1    job10  2517

排序重叠位置是次要问题。如果我知道上面的内容基本上就是你的目标,那么我们可以解决下一步问题。