所以我有这样的原始文件,大约有20k列,类似于:
number|colour|(a|1)|animal
1|green|x|dog
2|blue|y|cat
3|red|z|owl
当我使用read_csv(' raw.csv',sep =' |')时,这会创建一个带有额外列的数据帧,因为(a | 1)列会被拆分。
我尝试使用quotechar参数,但这只能使用一个值。任何帮助将不胜感激
答案 0 :(得分:3)
根据您提供的示例数据,额外的分隔符仅显示在标题行中。因此,您可以使用"require": {
"php": ">=5.3.3",
"composer/installers": "~1.0",
"fuel/core": "1.8.*",
"fuel/auth": "1.8.*",
"fuel/email": "1.8.*",
"fuel/oil": "1.8.*",
"fuel/orm": "1.8.*",
"fuel/parser": "1.8.*",
"fuelphp/upload": "2.0.6",
"monolog/monolog": "1.18.*",
"phpseclib/phpseclib": "2.0.0",
"michelf/php-markdown": "1.4.0",
"twig/twig" : "1.31.0",
"mthaml/mthaml": "*"
},
关键字提供自己的列名,然后告诉Pandas跳过标题行,如下所示:
names
这会给你:
import pandas as pd
df = pd.read_csv('raw.csv', sep='|', skiprows=1, names=["number", "colour", "(a|1)", "animal"])
print df