我正在使用zipcode dataset和csvkit,但无处可去。如果我csvcut -n zipcode.csv
,我会看到一个清晰的列列表:
1: zip
2: city
3: state
4: latitude
5: longitude
6: timezone
7: dst
但是我对csvgrep
进行的任何搜索只会给我一个错误。这是一大块数据:
"99919","Thorne Bay","AK","55.677232","-132.55624","-9","1"
"99921","Craig","AK","55.456449","-133.02648","-9","1"
"99922","Hydaburg","AK","55.209339","-132.82545","-9","1"
"99923","Hyder","AK","55.941442","-130.0545","-9","1"
"99925","Klawock","AK","55.555164","-133.07316","-9","1"
"99926","Metlakatla","AK","55.123897","-131.56883","-9","1"
"99927","Point Baker","AK","56.337957","-133.60689","-9","1"
"99928","Ward Cove","AK","55.395359","-131.67537","-9","1"
"99929","Wrangell","AK","56.409507","-132.33822","-9","1"
"99950","Ketchikan","AK","55.875767","-131.46633","-9","1"
根据the docs,我预计csvgrep -c 2 -m "Hyder" zipcode.csv
会出现匹配,但我会得到:
zip,city,state,latitude,longitude,timezone,dst
list index out of range
我可以在其他csv文件上使用csvgrep
罚款 - 为什么会对这个文件感到窒息?
答案 0 :(得分:1)
您的问题是“zipcodes.csv”格式错误;它包括空行。例如,第17行是空白的:
"00607","Aguas Buenas","PR","18.256995","-66.104657","-4","0"
"00609","Aibonito","PR","18.142002","-66.273278","-4","0"
该文档的作者可能已经这样做,表明邮政编码00608不存在,这在某些情况下可能会有所帮助,但是阻止您使用csvkit实用程序。
你可以使用sed,如果你使用的是基于* nix的操作系统,你已经安装了自动删除空行,如下所示:
$ sed '/^$/d' zipcode.csv > zipcode2.csv
这会将结果存储为“zipcode2.csv”。我们现在可以使用我们新的“固定”邮政编码文件:
$ csvgrep -c 2 -m "Hyder" zipcode2.csv
zip,city,state,latitude,longitude,timezone,dst
99923,Hyder,AK,55.941442,-130.0545,-9,1
答案 1 :(得分:1)
为了防止大多数错误如上所述,我使用csvclean(也来自csvkit)来查找和纠正源csv中的损坏数据。另请查看this blog post以获取完整的操作方法