我正在尝试学习R并且有一个关于重塑以下数据集的问题。
bankname,date,year,month,quarter,totalliabilities,corr1,amt1,corr2,amt2
Bank of Pittsgurgh,2/7/1950,1950,2,1,237991,#N/A,#N/A,#N/A,#N/A
Bank of Pittsgurgh,5/2/1950,1950,5,2,258865,#N/A,#N/A,#N/A,#N/A
Bank of Pittsgurgh,8/7/1950,1950,8,3,218524,#N/A,#N/A,#N/A,#N/A,#N/A
Bank of Pittsgurgh,11/6/1950,1950,11,4,237520,First Bank,17472,Third Bank,30711
The Arsenal Bank,2/2/1950,1950,2,1,218508,#N/A,#N/A,#N/A,#N/A
The Arsenal Bank,5/3/1950,1950,5,2,224110,#N/A,#N/A,#N/A,#N/A
The Arsenal Bank,8/2/1950,1950,8,3,216071,#N/A,#N/A,#N/A,#N/A
The Arsenal Bank,11/1/1950,1950,11,4,226166,National Bank,20966,Trust Company,873
当我运行以下代码重塑时,我收到以下错误。我怎样才能解决这个问题? 另外,我想将amt变量解析为数值变量并删除此数据集中的#NA。我怎样才能解析这个变量?
- 首先我尝试创建“id”
bank_test2$id<-as.numeric(as.factor(bank_test2$bankname))
- 然后我尝试使用年份和季度创建一个唯一的时间变量
bank_test2$yq<-as.factor(paste(as.character(bank_test2$year),as.character(bank_test2$quarter)))
bank_test2<-bank_test2[with(bank_test2, order(yq,id)),]
- 塑造数据
v <- outer(c("corr", "amt"), c(1:2), FUN=paste0)
bank_test2<-reshape(bank_test2, direction='long', varying=c(v), sep='')
Error in `row.names<-.data.frame`(`*tmp*`, value = paste(d[, idvar], times[1L], :
duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique values when setting 'row.names': ‘1.1’, ‘2.1’
id, bankname, date, year, month, quarter, totalliabilities, node, corr, amt
1, Bank of Pittsgurgh, 2/7/1950, 1950, 2, 1, 237991, 1, #N/A, #N/A
1, Bank of Pittsgurgh, 5/2/1950, 1950, 5, 2, 258865, 1, #N/A, #N/A
1, Bank of Pittsgurgh, 8/7/1950, 1950, 8, 3, 218524, 1, #N/A, #N/A
1, Bank of Pittsgurgh, 11/6/1950, 1950, 11, 4, 237520, 1, First Bank, 21906
1, Bank of Pittsgurgh, 2/7/1950, 1950, 2, 1, 237991, 2, #N/A, #N/A
1, Bank of Pittsgurgh, 5/2/1950, 1950, 5, 2, 258865, 2, #N/A, #N/A
1, Bank of Pittsgurgh, 8/7/1950, 1950, 8, 3, 218524, 2, #N/A, #N/A
1, Bank of Pittsgurgh, 11/6/1950, 1950, 11, 4, 237520, 2, Third Bank, 4442
2, The Arsenal Bank, 2/2/1950, 1950, 2, 1, 218508, 1, #N/A, #N/A
2, The Arsenal Bank, 5/3/1950, 1950, 5, 2, 224110, 1, #N/A, #N/A
2, The Arsenal Bank, 8/2/1950, 1950, 8, 3, 216071, 1, #N/A, #N/A
2, The Arsenal Bank, 11/1/1950, 1950, 11, 4, 226166, 1, National Bank, 43224
2, The Arsenal Bank, 2/2/1950, 1950, 2, 1, 218508, 2, #N/A, #N/A
2 The Arsenal Bank, 5/3/1950, 1950, 5, 2, 224110, 2, #N/A, #N/A
2 The Arsenal Bank, 8/2/1950, 1950, 8, 3, 216071, 2, #N/A, #N/A
2 The Arsenal Bank, 11/1/1950, 1950, 11, 4, 226166, 2, Trust Company, 3682
我希望以这种方式组织数据,使用“bankname”中新创建的bankid,并使用id和time值创建唯一的rownames。然后我想删除数据集中的所有#NA 我该怎么办?
提前谢谢。
答案 0 :(得分:0)
这个特殊的错误是抱怨rownames不是唯一的。为避免这种情况,您需要将每行的唯一ID重新整形为“idvar”。最好的方法是在原始数据框架中创建具有此唯一ID的新列,但您也可以使用任何其他唯一的字段。例如,资产负债在您的数据框中是唯一的,因此您可以使用:
bank_test2<-reshape(bank_test2, direction='long', varying=c(v), sep='',idvar="totalliabilities")
这显然不是身份证的最佳选择,但我希望能指出正确的方向。
答案 1 :(得分:0)
我试图以易于使用和重现的方式提供数据。然后我获取了您的数据的一部分b
,并尝试将其设置为长格式。不确定它是否是所需的输出。
library(reshape2)
library(stringr)
a <- structure(list(bankname = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,2L, 2L, 2L, 2L), .Label = c("Bank of Pittsgurgh", "The Arsenal Bank_Pittsburgh"), class "factor"), date = structure(c(2L, 3L, 6L, 8L, 9L, 12L,13L, 15L, 1L, 4L, 5L, 7L, 10L, 11L, 14L, 16L), .Label = c("1950/02/02", "1950/02/07", "1950/05/02", "1950/05/03", "1950/08/02", "1950/08/07", "1950/11/01", "1950/11/06", "1951/02/05", "1951/02/06", "1951/05/01", "1951/05 07", "1951/08/06", "1951/08/07", "1951/11/03", "1951/11/06"), class = "factor"), year = c(1950L, 1950L, 1950L, 1950L, 1951L, 1951L, 1951L, 1951L, 1950L, 1950L, 1950L, 1950L, 1951L, 1951L, 1951L, 1951L), month = c(2L, 5L, 8L, 11L, 2L, 5L, 8L, 11L, 2L, 5L, 8L, 11L, 2L, 5L, 8L, 11L), quarter = c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L), totalliabilities = c(237991.5469, 258865.6563, 218524, 237520.5469, 276052.1875, 255812.7031, 62426.625, 272447.375, 218508.4844, 224110.5156, 216071.9063, 226166.7969, 244241.625, 228508.0625, 254008.8594, 268540.1563), corr1 = structure(c(1L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 3L, 1L, 1L, 1L, 3L ), .Label = c("#N/A", "First National Bank", "National Bank of Commerce" ), class = "factor"), amt1 = structure(c(1L, 1L, 1L, 2L, 1L, 1L, 1L, 4L, 1L, 1L, 1L, L, 1L, 1L, 1L, 5L), .Label = c("#N/A", "17472.98047", "20966.50977", "21906.07031", 43224.62891" ), class = "factor"), corr2 = structure(c(1L, 1L, 1L, 2L, 1L, 1L, 1L, 3L, 1L, 1L, L, 5L, 1L, 1L, 1L, 4L), .Label = c("#N/A", "Third National Bank", "Third National Bank", "Union Trust Company", "Unit Trust Company Of New York"), class = "factor"), amt2 = structure(c(1L, 1L, 1L, 2L, 1L, 1L, 1L, 4L, 1L, 1L, 1L, 5L, 1L, 1L, 1L, 3L ), .Label = c("#N/A", "30711.35938", "3682.449951", "4442.399902", "873.1699829"), class = "factor"), X = structure(c(1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("", "#N/A"), class = "factor"), id = c(1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2)), .Names = c("bankname", "date", year", month", "quarter", "totalliabilities", "corr1", "amt1", "corr2", "amt2", "X", "id"), row.names = c(NA, 16L), class = "data.frame")
b<- a[c(8,12,16),c(1,2,7,8,9,10)]
b
# put the data related to corr1 and amt1 in one column type1 same for type2
b$type1 <- paste0(b$corr1,"|",b$amt1)
b$type2 <- paste0(b$corr2,"|",b$amt2)
# melt the types together
c<- melt(b, measure.vars=c(7,8))
c
# split them them back
long <- data.frame(str_split_fixed(c$value,"\\|",2))
d <- cbind(c,long)
d[,c(1,9,10)]
# bankname X1 X2
#1 Bank of Pittsgurgh First National Bank 21906.07031
#2 The Arsenal Bank_Pittsburgh National Bank of Commerce 20966.50977
#3 The Arsenal Bank_Pittsburgh National Bank of Commerce 43224.62891
#4 Bank of Pittsgurgh Third National Bank 4442.399902
#5 The Arsenal Bank_Pittsburgh Unit Trust Company Of New York 873.1699829
#6 The Arsenal Bank_Pittsburgh Union Trust Company 3682.449951