WK,MND,CS,SHP,RevCY,RevLY,TCY,TLY,ACY,ALY
"2,JAN,GER,""Victoria's Secrets"",29307,25419,841,768,2320,1755"
2,JAN,KAP,Brand Shop,2027,-,95,0,175,-0
2,JAN,KAP,Kapp‚ Drugstore West,89768,78824,3309,3052,6197,5634
2,JAN,KAP,Kapp‚ P&C Centraal,680019,640951,8709,8116,19450,18385
2,JAN,KAP,Kapp‚ Sunglasses Centraal,49216,43940,464,421,550,478
2,JAN,KAP,Kapp‚ Sunglasses Schengen,25721,26592,306,318,333,378
2,JAN,KAP,Kapp‚ Sunglasses West,50280,53089,477,510,566,_78
我似乎总是很难将数据放入正确的结构中。我有上述数据结构(文件超过10K行)。加载它时,我希望列具有特定的数据类。
当我输入:
RIS <- read.table("RIS.txt", sep=",", header=T, fill=T,
colClasses=c("integer", "character", "factor", "factor", rep("numeric",6)))
我收到错误消息:
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
scan() expected 'an integer', got '"2'
我认为这是因为列WK
实际上包含杂乱的符号。但在其他专栏中也可能出现这种情况。
任何人都可以帮我正确加载这些数据并“清理”数据集以使其成为正确的格式或类吗?
答案 0 :(得分:0)
您有典型的数据清理问题 - 根据我的经验,典型分析任务的80%项目时间会被数据准备所消耗。
根据您的数据样本,请尝试以下方法:
read.csv()
与参数quote=""
一起使用。这将忽略所有引号 - 但当然您可能必须稍后删除它们。试试这个:
data <- "
WK,MND,CS,SHP,RevCY,RevLY,TCY,TLY,ACY,ALY
\"2,JAN,GER,\"\"Victoria's Secrets\"\",29307,25419,841,768,2320,1755\"
2,JAN,KAP,Brand Shop,2027,-,95,0,175,-0
2,JAN,KAP,Kapp‚ Drugstore West,89768,78824,3309,3052,6197,5634
2,JAN,KAP,Kapp‚ P&C Centraal,680019,640951,8709,8116,19450,18385
2,JAN,KAP,Kapp‚ Sunglasses Centraal,49216,43940,464,421,550,478
2,JAN,KAP,Kapp‚ Sunglasses Schengen,25721,26592,306,318,333,378
2,JAN,KAP,Kapp‚ Sunglasses West,50280,53089,477,510,566,_78
"
现在阅读数据:
x <- read.csv(text=data, quote="", header=TRUE)
开始清洁过程:
numericCols <- c(1, 5:10)
x[numericCols] <- lapply(x[numericCols], function(x)as.numeric(gsub("[-_\"]", "", x)))
x
结果:
WK MND CS SHP RevCY RevLY TCY TLY ACY ALY
1 2 JAN GER ""Victoria's Secrets"" 29307 25419 841 768 2320 1755
2 2 JAN KAP Brand Shop 2027 NA 95 0 175 0
3 2 JAN KAP Kapp‚ Drugstore West 89768 78824 3309 3052 6197 5634
4 2 JAN KAP Kapp‚ P&C Centraal 680019 640951 8709 8116 19450 18385
5 2 JAN KAP Kapp‚ Sunglasses Centraal 49216 43940 464 421 550 478
6 2 JAN KAP Kapp‚ Sunglasses Schengen 25721 26592 306 318 333 378
7 2 JAN KAP Kapp‚ Sunglasses West 50280 53089 477 510 566 78