我有一个csv文件,遗憾的是在文件中间有大量损坏的数据(行348:68914)。但是,前面的行以及后续行中有可用的数据。当我尝试将文件读入R时,read.csv只读取前347行,直到遇到符号(表示损坏)。
我有大约500个大文件,其中15-20个有这个问题。由于我正在尝试自动处理所有这些文件,因此我试图避免手动清除损坏的数据,并希望在将csv读入R后简单地删除行。
> Arb<-read.csv(file = "A0_022916_DataLog.csv", header=TRUE, stringsAsFactors=FALSE)
> str(Arb)
'data.frame': 347 obs. of 18 variables:
$ Date : chr "2000-1-6" "2000-1-6" "2000-1-6" "2000-1-6" ...
$ Time : chr "20:49:35" "20:50:0" "20:51:0" "20:52:0" ...
$ Outside.Temp.deg.C: chr "5.84" "5.79" "5.80" "5.84" ...
我尝试通过向fileEncoding,quote和check.names添加规范来解决此问题。我尝试了latin1,UTF-8和UTF-16进行文件编码,但是我收到了错误。
> Arb<-read.csv(file = "A0_022916_DataLog.csv", header=TRUE, stringsAsFactors=FALSE,
+ fileEncoding = "latin1", quote = "", check.names = FALSE)
Warning message:
In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
invalid input found on input connection 'A0_022916_DataLog.csv'
以下是我的数据的示例:
Date,Time,Outside Temp deg C,Inside Temp deg C,Rain Temp deg C,Wetness Temp deg C,RainCount,Wetness Voltage,Outside Wind Speed,Inside Wind Speed,Heat Percent,Roof Is Open,Roof Vent Is Open,Vent Is Open,Fan Is On,Box Heat Is On,ClkTemp C,High Wind
1/7/2000,2:32:00,5.59,5.9,25.43,16.62,0,0,0,0,0,TRUE,FALSE,FALSE,FALSE,FALSE,17,FALSE
1/7/2000,2:33:01,5.53,6.22,25.35,16.68,0,0,1.5,0,0,TRUE,FALSE,FALSE,FALSE,FALSE,16.75,FALSE
1/7/2000,2:34:03,5.63,6.07,25.39,16.74,0,0,1.5,0,0,TRUE,FALSE,FALSE,FALSE,FALSE,16.5,FALSE
1/7/2000,2:35øéüAßj,N²pU(°øKÜÙÀÛª0Ûؽ¹Hõ<`º¦Öópù4)Øváø§ÊîØB!ôË\´-äªý§ÙÐn2P3ßȶÝ]ëᄀV§â¡7Oà~7ãÁô*ʪ¼jº¯§ÙLÅQ-ø5sæ{Ás¢qÇËQvn¾QóÞ»t.µÉ·¹EÍ=^KðB÷*óÆÐsMë×Ç]ív¾³Û_ êIy|fcT]ôjËû)©öÈɨŠAíMA×v?Wöà[~ÜP»ëGÖý±c»¤'¨{,,,,,,,,,,,,,,,
¹ÔvM,"B{máü*7L`Ë-MSYúÖqáw¬£VÚsWóú/¨þã]Ck""iØÓtµ7M¡?9ýê'¬AmÏK±Â %DÛyøÐVR`&!ÜÞô'¨oÕ+m[q´fzÃìt@°Æ_v <ër̯^Zm^ä>9Ã]Ò/ÁÅÇD³Q¼ÌÍáÄøûjxW^¢º±éhmyhn÷}nú=HÆ.ðB3^âÆeâLÊa","+gCÌíDÜ{ Ô ò+²¿%ÔÛïYZqLð`Ûvô¡XÁÔßøéüAßj",N²pU(°øKÜÙÀÛª0Ûؽ¹Hõ<`º¦Öópù4)Øváø§ÊîØB!ôË\´-äªý§ÙÐn2P3ßȶÝ]ëᄀV§â¡7Oà~7ãÁô*ʪ¼jº¯§ÙLÅQ-ø5sæ{Ás¢qÇËQvn¾QóÞ»t.µÉ·¹EÍ=^KðB÷*óÆÐsMë×Ç]ív¾³Û_ êIy|fcT]ôjËû)©öÈɨŠAíMA×v?Wöà[~ÜP»ëGÖý±c»¤'¨{,,,,,,,,,,,,,,
¹ÔvM,"B{máü*7L`Ë-MSYúÖqáw¬£VÚsWóú/¨þã]Ck""iØÓtµ7M¡?9ýê'¬AmÏK±Â %DÛyøÐVR`&!ÜÞô'¨oÕ+m[q´fzÃìt@°Æ_v <ër̯^Zm^ä>9Ã]Ò/ÁÅÇD³Q¼ÌÍáÄøûjxW^¢º±éhmyhn÷}nú=HÆ.ðB3^âÆeâLÊa","+gCÌíDÜ{ Ô ò+²¿%ÔÛïYZqLð`Ûvô¡XÁÔßøéüAßj",N²pU(°øKÜÙÀÛª0Ûؽ¹Hõ<`º¦Öópù4)Øváø§ÊîØB!ôË\´-äªý§ÙÐn2P3ßȶÝ]ëᄀV§â¡7Oà~7ãÁô*ʪ¼jº¯§ÙLÅQ-ø5sæ{Ás¢qÇËQvn¾QóÞ»t.µÉ·¹EÍ=^KðB÷*óÆÐsMë×Ç]ív¾³Û_ êIy|fcT]ôjËû)©öÈɨŠAíMA×v?Wöà[~ÜP»ëGÖý±c»¤'¨{,,,,,,,,,,,,,,
提前感谢有关如何解决此问题的任何见解。