我有以下csv文件:
"ID,""oldid"",""country"",""side_a"",""densdiff"
"10,32,""Afghanistan"",""Afghanistan"",""Various organizations"
在练习中我们得到了一些csv文件,其中总是格式化"清除" e.g。
"ID","oldid","country" ...
"10","32","Afghanistan" ...
我发现,分隔符是","但它有时是一个字符串(" ID,"),有时没有分隔符 (好像 : " intden"""" densdiff""" 所以我不知道如何处理最后两个引号)
我没有找到一个很好的网站来解释" mixed-csv-formatted"输入R.
编辑:这是完整的标题和第一行:
"ID,""oldid"",""country"",""side_a"",""side_b"",""cow"",""incompatibility"",""terr"",""begin"",""end"",""type"",""identity"",""radius"",""confarea"",""landarea"",""confland"",""rel_scope"",""distance"",""maxdist"",""mindist"",""disper"",""pop2000"",""resource"",""mountain"",""forest"",""border"",""mindisx"",""lnmndist"",""confarex"",""ln_abs_scope"",""ln_land_area"",""lnpop"",""lnconpro"",""duration"",""distx"",""location"",""mountx"",""frstx"",""lnmountx"",""lnfrstx"",""diamond"",""diadist"",""gold"",""golddist"",""oil"",""oildist"",""roadpave"",""roadtot"",""pavetot"",""paveland"",""roadland"",""disxsqr"",""mndisxsq"",""stabilit"",""rulelaw"",""nocorrup"",""lnd100km"",""pop100km"",""lnd100cr"",""pop100cr"",""landlock"",""ciffob95"",""coastden"",""intden"",""densdiff"""
下一行:
"10,32,""Afghanistan"",""Afghanistan"",""Various organizations"",700,2,"""",1978,2000,3,1,400,500,652,77,77,122,522,0,0.509999990463257,27,0,66,3,1,1,0,500,6.21460819244385,6.4800443649292,3.29583692550659,0.959037899971008,23,122,4.80402088165283,66,3,4.18965482711792,1.0986123085022,0,NA,0,NA,0,NA,2.79999995231628,21,13.3333330154419,0.429447859525681,3.22085881233215,14884,1,NA,NA,NA,0,0,0,0,1,NA,0,36,-36"
编辑2: 经过大量的trubbleshooting我只下载了csv文件,现在它很干净。在询问我的讲师后,我会发表评论。感谢所有的帮助:)
答案 0 :(得分:1)
> x <- read_lines("data.csv") #Read the dirty quotes csv file
> x # Display contents
[1] "\"ID,\"\"oldid\"\",\"\"country\"\",\"\"side_a\"\",\"\"densdiff\""
[2] "\"10,32,\"\"Afghanistan\"\",\"\"Afghanistan\"\",\"\"Various organizations\""
> x2 <- textConnection(gsub('"', "", x)) # Replace all " with null and create a connection object
> x3 <- read.csv(x2, header=TRUE) # Read the conn object as you would a regular file
> x3
ID oldid country side_a densdiff
1 10 32 Afghanistan Afghanistan Various organizations
答案 1 :(得分:1)
mOnClickListener = new View.OnClickListener() {
@Override
public void onClick(View v) {
snackBar.dismiss(); // to close the snackbar
// startActivity(nextActivityIntent)
}
};
这个csv被写成整行是一个字段,并用引号括起来。因此,内部报价会被额外报价转义。
因此,它实际上是一个从已经格式良好的csv文件(或数据)生成的csv文件,现在整行都转换为单个字段。
这可能首先在源处修复。
要在之后修复,应该读入行并将其解析为一个csv字段。然后是解析字段的内容(现在应该删除所有额外的引号)
"ID,""oldid"",""country"",""side_a"",""densdiff" .."
"10,32,""Afghanistan"",""Afghanistan"",""Various organizations" .."
应该再次被处理并解析为完整的csv行。
答案 2 :(得分:1)
正如David Arenburg在评论中所说,你应该尝试这样的事情:
> read.csv(text = gsub("\"", "", readLines("file.csv")))
ID oldid country side_a densdiff
1 10 32 Afghanistan Afghanistan Various organizations
答案 3 :(得分:0)
正确的CSV应如下所示:
12,13,"abc","def"
以下应该清理它,因为格式对应于整个示例,并且字符串中没有任何逗号:
cat my.csv | sed 's/,"/,/' | sed 's/","/,/g' | sed 's/^"//' > mynew.csv