我有两个数据帧data1
和data2
。我正在尝试spread
的数据或在x2
中的一列data1
上创建伪变量。我可以执行以下操作:
library(dummies)
x2dummy <- dummy(data1$x2)
final_out <- cbind(data1, x1dummy)
这将给我一个包含190列和500个观测值的大型数据框,但是x2
项的范围比当前数据框data1
的范围大。我有一种字典或不同的数据框,其中包含可以选择data2
的所有唯一项。我如何spread
data1
data2
我的数据441
,以便我将拥有data2
个虚拟变量列(data1
的长度)并在其中填充项目在data1 <- structure(list(y = c(440000, 550000, 990, 135000, 267000, 135000,
239000, 170000, 855000, 158000, 1200, 256000, 86000, 98700, 450000,
130000, 465000, 308000, 680000, 305000), x1 = c(240, 156, 52,
74, 85, 70, 160, 176, 386, 65, 52, 90, 87, 193, 110, 105, 126,
76, 153, 133), x2 = c(8338, 8860, 8003, 8207, 8901, 8224, 8811,
8508, 8840, 8940, 8012, 8223, 8206, 8490, 8023, 8490, 8870, 8024,
8011, 8394)), .Names = c("y", "x1", "x2"), row.names = c(NA,
20L), class = "data.frame")
中?
编辑:添加新的较小数据样本:
数据1:
data2 <- c(4375, 8001, 8002, 8003, 8004, 8005, 8006, 8007, 8008, 8009,
8010, 8011, 8012, 8013, 8014)
数据2:
data2
编辑:
感谢社区的编辑,但是现在universe
不包含完整的data1
信息。例如;在x2
-> 8206
= data2
中,但是这并没有出现在spread
上方,这是我试图spread
所依据的数据。
我想用data2
中的所有唯一值data1
来新建数据帧的列,然后用x2
列data1
中的值来填充这些列。 / p>
基于data2 <- structure(list(x2_dictionary = c(4375, 8001, 8002, 8003, 8004,
8005, 8006, 8007, 8008, 8009, 8010, 8011, 8012, 8013, 8014, 8015,
8016, 8017, 8018, 8019, 8020, 8021, 8022, 8023, 8024, 8025, 8026,
8026, 8027, 8028, 8029, 8030, 8031, 8032, 8033, 8034, 8035, 8036,
8037, 8038, 8039, 8040, 8041, 8042, 8100, 8104, 8105, 8106, 8107,
8110, 8120, 8130, 8140, 8146, 8148, 8148, 8150, 8160, 8161, 8170,
8172, 8173, 8174, 8175, 8178, 8180, 8181, 8182, 8183, 8183, 8183,
8184, 8184, 8185, 8186, 8187, 8188, 8189, 8190, 8191, 8192, 8193,
8194, 8195, 8196, 8197, 8198, 8201, 8202, 8203, 8204, 8205, 8206,
8207, 8208, 8210, 8211, 8212, 8213, 8214, 8220, 8221, 8222, 8223,
8224, 8225, 8226, 8227, 8228, 8230, 8231, 8232, 8233, 8240, 8241,
8242, 8243, 8250, 8251, 8251, 8253, 8254, 8254, 8255, 8256, 8256,
8259, 8260, 8261, 8262, 8263, 8269, 8269, 8270, 8270, 8271, 8272,
8273, 8274, 8275, 8275, 8278, 8278, 8279, 8280, 8281, 8281, 8281,
8281, 8282, 8282, 8289, 8289, 8290, 8291, 8292, 8293, 8294, 8295,
8296, 8297, 8298, 8299, 8301, 8302, 8303, 8304, 8310, 8317, 8318,
8319, 8320, 8328, 8329, 8330, 8338, 8339, 8340, 8348, 8349, 8350,
8350, 8358, 8359, 8360, 8370, 8380, 8384, 8389, 8390, 8391, 8392,
8393, 8394, 8395, 8396, 8397, 8398, 8401, 8401, 8402, 8403, 8410,
8415, 8416, 8420, 8430, 8440, 8440, 8445, 8450, 8455, 8458, 8458,
8459, 8459, 8460, 8460, 8460, 8461, 8469, 8469, 8470, 8470, 8471,
8472, 8474, 8476, 8479, 8480, 8490, 8495, 8500, 8503, 8503, 8504,
8504, 8505, 8506, 8507, 8508, 8508, 8509, 8510, 8510, 8511, 8511,
8512, 8513, 8514, 8515, 8516, 8518, 8519, 8519, 8519, 8519, 8520,
8521, 8529, 8530, 8530, 8540, 8550, 8551, 8552, 8553, 8554, 8559,
8560, 8569, 8569, 8570, 8571, 8572, 8573, 8580, 8585, 8587, 8588,
8589, 8589, 8589, 8590, 8591, 8591, 8592, 8593, 8600, 8607, 8610,
8611, 8612, 8613, 8619, 8619, 8619, 8620, 8629, 8630, 8635, 8640,
8650, 8660, 8670, 8672, 8672, 8680, 8690, 8691, 8692, 8693, 8693,
8694, 8694, 8695, 8695, 8696, 8696, 8697, 8698, 8699, 8699, 8699,
8699, 8700, 8710, 8711, 8712, 8717, 8717, 8718, 8719, 8719, 8719,
8719, 8720, 8729, 8730, 8731, 8731, 8732, 8732, 8733, 8734, 8734,
8735, 8736, 8737, 8738, 8739, 8739, 8740, 8750, 8753, 8754, 8755,
8756, 8757, 8758, 8759, 8760, 8769, 8770, 8770, 8773, 8775, 8776,
8777, 8779, 8780, 8781, 8782, 8783, 8784, 8785, 8786, 8787, 8787,
8787, 8787, 8788, 8789, 8790, 8791, 8792, 8792, 8793, 8794, 8795,
8796, 8797, 8798, 8798, 8799, 8800, 8801, 8810, 8811, 8812, 8818,
8820, 8830, 8840, 8840, 8849, 8850, 8859, 8860, 8870, 8871, 8880,
8901, 8902, 8903, 8904, 8905, 8906, 8907, 8908, 8911, 8912, 8913,
8914, 8915, 8916, 8917, 8918, 8921, 8922, 8923, 8924, 8930, 8940,
8950, 8960, 8970, 8980, 17532, 43421, 80338)), class = "data.frame", row.names = c(NA,
-441L), .Names = "x2_dictionary")
中的小数据,我将得到一个非常稀疏的矩阵。
Data2
line extra="myflag"