我想使用R在数据框中使用分号分隔的文本,而不使用第三方程序包。
我有以下数据。
> #To view the first 6 rows of the data
> head(bank1)
age.job.marital.education.default.housing.loan.contact.month.day_of_week.duration.campaign.pdays.previous.poutcome.emp.var.rate.cons.price.idx.cons.conf.idx.euribor3m.nr.employed.y
1 56;housemaid;married;basic.4y;no;no;no;telephone;may;mon;261;1;999;0;nonexistent;1.1;93.994;-36.4;4.857;5191;no
2 57;services;married;high.school;unknown;no;no;telephone;may;mon;149;1;999;0;nonexistent;1.1;93.994;-36.4;4.857;5191;no
3 37;services;married;high.school;no;yes;no;telephone;may;mon;226;1;999;0;nonexistent;1.1;93.994;-36.4;4.857;5191;no
4 40;admin.;married;basic.6y;no;no;no;telephone;may;mon;151;1;999;0;nonexistent;1.1;93.994;-36.4;4.857;5191;no
5 56;services;married;high.school;no;no;yes;telephone;may;mon;307;1;999;0;nonexistent;1.1;93.994;-36.4;4.857;5191;no
6 45;services;married;basic.9y;unknown;no;no;telephone;may;mon;198;1;999;0;nonexistent;1.1;93.994;-36.4;4.857;5191;no
请帮助我根据标题中的列名称将数据分为多个列。
谢谢。
答案 0 :(得分:0)
您可以将data.frame
列连接到character
字符串中,然后再次运行read.table
。但是请注意,列名(28)的数量不等于列(21)的数量。此外,列标题和观察值的距离是不同的(标题的空间看起来像空格,观察的分号看起来像空格)。
请参见下面的代码:
df <- structure(list(age.job.marital.education.default.housing.loan.contact.month.day_of_week.duration.campaign.pdays.previous.poutcome.emp.var.rate.cons.price.idx.cons.conf.idx.euribor3m.nr.employed.y = structure(c(4L,
6L, 1L, 2L, 5L, 3L), .Label = c("37;services;married;high.school;no;yes;no;telephone;may;mon;226;1;999;0;nonexistent;1.1;93.994;-36.4;4.857;5191;no",
"40;admin.;married;basic.6y;no;no;no;telephone;may;mon;151;1;999;0;nonexistent;1.1;93.994;-36.4;4.857;5191;no",
"45;services;married;basic.9y;unknown;no;no;telephone;may;mon;198;1;999;0;nonexistent;1.1;93.994;-36.4;4.857;5191;no",
"56;housemaid;married;basic.4y;no;no;no;telephone;may;mon;261;1;999;0;nonexistent;1.1;93.994;-36.4;4.857;5191;no",
"56;services;married;high.school;no;no;yes;telephone;may;mon;307;1;999;0;nonexistent;1.1;93.994;-36.4;4.857;5191;no",
"57;services;married;high.school;unknown;no;no;telephone;may;mon;149;1;999;0;nonexistent;1.1;93.994;-36.4;4.857;5191;no"
), class = "factor")), row.names = c(NA, -6L), class = "data.frame")
z <- paste(df[, 1], sep = "", collapse = "\n")
df2 <- read.table(text = z, header = FALSE, sep = ";")
nms2 <- unlist(strsplit(names(df), "\\."))[-(22:28)]
names(df2) <- nms2
str(df2)
输出:
'data.frame': 6 obs. of 21 variables:
$ age : int 56 57 37 40 56 45
$ job : Factor w/ 3 levels "admin.","housemaid",..: 2 3 3 1 3 3
$ marital : Factor w/ 1 level "married": 1 1 1 1 1 1
$ education : Factor w/ 4 levels "basic.4y","basic.6y",..: 1 4 4 2 4 3
$ default : Factor w/ 2 levels "no","unknown": 1 2 1 1 1 2
$ housing : Factor w/ 2 levels "no","yes": 1 1 2 1 1 1
$ loan : Factor w/ 2 levels "no","yes": 1 1 1 1 2 1
$ contact : Factor w/ 1 level "telephone": 1 1 1 1 1 1
$ month : Factor w/ 1 level "may": 1 1 1 1 1 1
$ day_of_week: Factor w/ 1 level "mon": 1 1 1 1 1 1
$ duration : int 261 149 226 151 307 198
$ campaign : int 1 1 1 1 1 1
$ pdays : int 999 999 999 999 999 999
$ previous : int 0 0 0 0 0 0
$ poutcome : Factor w/ 1 level "nonexistent": 1 1 1 1 1 1
$ emp : num 1.1 1.1 1.1 1.1 1.1 1.1
$ var : num 94 94 94 94 94 ...
$ rate : num -36.4 -36.4 -36.4 -36.4 -36.4 -36.4
$ cons : num 4.86 4.86 4.86 4.86 4.86 ...
$ price : int 5191 5191 5191 5191 5191 5191
$ idx : Factor w/ 1 level "no": 1 1 1 1 1 1