我有超过1600人的传记资料。这些数据包括他们的性别,出生年份,家乡等,以及从他们开始工作的那一年起的职业轨迹。我试图把它变成一个面板数据,这样我就可以了解自从他们开始工作以来他们的工作场所发生了怎样的变化。我对此数据集存在以下问题:
1)如何将其转换为面板数据集?我想要的每个人(id)的最佳格式是:
id gender hometown year job
1 1 1 NY 1990 3
1 1 1 NY 1991 3
1 1 1 NY 1992 3
1 1 1 NY 1993 3
1 1 1 NY 1994 5
2)如果此人有重叠职位,我该如何保存信息?例如,该人可以同时拥有工作3和工作5。我希望以后只能使用比另一个更高的工作,但同时我想尽可能多地保存信息。
答案 0 :(得分:1)
好的,试一试。
首先选择数据的子集。
> (D = head(origin[, c("id", "name1", "gender", "job1", "job1s", "job1e",
"job2", "job10")]))
id name1 gender job1 job1s job1e job2 job10
1 1 Abulaiti Abureduxiti 1 2305 1980 1991 2303 NA
2 2 Aisihaiti Kelimubai 1 2307 1972 1987 2307 NA
3 3 Ai Zhisheng 1 4509 1996 1997 1075 10103
4 4 An Pingsheng 1 3555 1975 1977 3561 2191
5 5 An Zhiwen 1 2063 1977 1979 1127 2507
6 6 An Ziwen 1 4509 1954 1966 4007 2517
接下来,我们将数据重新组织成我认为您所遵循的格式。
> library(reshape2)
> (D = melt(D, id.vars = c("id", "name1", "gender")))
id name1 gender variable value
1 1 Abulaiti Abureduxiti 1 job1 2305
2 2 Aisihaiti Kelimubai 1 job1 2307
3 3 Ai Zhisheng 1 job1 4509
4 4 An Pingsheng 1 job1 3555
5 5 An Zhiwen 1 job1 2063
6 6 An Ziwen 1 job1 4509
7 1 Abulaiti Abureduxiti 1 job1s 1980
8 2 Aisihaiti Kelimubai 1 job1s 1972
9 3 Ai Zhisheng 1 job1s 1996
10 4 An Pingsheng 1 job1s 1975
11 5 An Zhiwen 1 job1s 1977
12 6 An Ziwen 1 job1s 1954
13 1 Abulaiti Abureduxiti 1 job1e 1991
14 2 Aisihaiti Kelimubai 1 job1e 1987
15 3 Ai Zhisheng 1 job1e 1997
16 4 An Pingsheng 1 job1e 1977
17 5 An Zhiwen 1 job1e 1979
18 6 An Ziwen 1 job1e 1966
19 1 Abulaiti Abureduxiti 1 job2 2303
20 2 Aisihaiti Kelimubai 1 job2 2307
21 3 Ai Zhisheng 1 job2 1075
22 4 An Pingsheng 1 job2 3561
23 5 An Zhiwen 1 job2 1127
24 6 An Ziwen 1 job2 4007
25 1 Abulaiti Abureduxiti 1 job10 NA
26 2 Aisihaiti Kelimubai 1 job10 NA
27 3 Ai Zhisheng 1 job10 10103
28 4 An Pingsheng 1 job10 2191
29 5 An Zhiwen 1 job10 2507
30 6 An Ziwen 1 job10 2517
我们可以看到,这些记录中的一些记录的作业字段为空,因此我们将其排除在外。
> (D = D[complete.cases(D),])
id name1 gender variable value
1 1 Abulaiti Abureduxiti 1 job1 2305
2 2 Aisihaiti Kelimubai 1 job1 2307
3 3 Ai Zhisheng 1 job1 4509
4 4 An Pingsheng 1 job1 3555
5 5 An Zhiwen 1 job1 2063
6 6 An Ziwen 1 job1 4509
7 1 Abulaiti Abureduxiti 1 job1s 1980
8 2 Aisihaiti Kelimubai 1 job1s 1972
9 3 Ai Zhisheng 1 job1s 1996
10 4 An Pingsheng 1 job1s 1975
11 5 An Zhiwen 1 job1s 1977
12 6 An Ziwen 1 job1s 1954
13 1 Abulaiti Abureduxiti 1 job1e 1991
14 2 Aisihaiti Kelimubai 1 job1e 1987
15 3 Ai Zhisheng 1 job1e 1997
16 4 An Pingsheng 1 job1e 1977
17 5 An Zhiwen 1 job1e 1979
18 6 An Ziwen 1 job1e 1966
19 1 Abulaiti Abureduxiti 1 job2 2303
20 2 Aisihaiti Kelimubai 1 job2 2307
21 3 Ai Zhisheng 1 job2 1075
22 4 An Pingsheng 1 job2 3561
23 5 An Zhiwen 1 job2 1127
24 6 An Ziwen 1 job2 4007
27 3 Ai Zhisheng 1 job10 10103
28 4 An Pingsheng 1 job10 2191
29 5 An Zhiwen 1 job10 2507
30 6 An Ziwen 1 job10 2517
排序重叠位置是次要问题。如果我知道上面的内容基本上就是你的目标,那么我们可以解决下一步问题。