我正在使用GDELT dataset使用DataFrame
创建pandas pd.DataFrame.from_csv(path_to_data, sep=",")
时出现问题似乎加载了数据,除了第一个标题列转移到第1行如此:
箭头表示 Source 应该在哪里。以下是CSV格式的原始数据片段:
Source,Actor1Type1Code,Actor1Type2Code,Actor1Geo_CountryCode,Target,Actor2Type1Code,Actor2Type2Code,Actor2Geo_CountryCode,EventCode,f0_
PRINCE,GOV,,CA,CITIZEN,CVL,,CA,051,61
MEDIA,MED,,CA,MINIST,GOV,,CA,090,39
SUPREME COURT,JUD,,CA,DOCTOR,HLH,,CA,060,31
POLICE,COP,,CA,TORONTO,,,CA,173,31
PUBLISHER,MED,,CA,BUSINESS,BUS,,CA,010,29
HOSPITAL,HLH,,CA,POLICE,COP,,CA,043,28
HOSPITAL,HLH,,CA,TORONTO,,,CA,043,26
POLICE,COP,,CA,HOSPITAL,HLH,,CA,042,26
PRIME MINISTER,GOV,,CA,GERMANY,,,FR,042,22
谢谢!
卡尔文
答案 0 :(得分:3)
请勿使用from_csv
不再维护,请使用read_csv
:
In [244]:
t="""Source,Actor1Type1Code,Actor1Type2Code,Actor1Geo_CountryCode,Target,Actor2Type1Code,Actor2Type2Code,Actor2Geo_CountryCode,EventCode,f0_
PRINCE,GOV,,CA,CITIZEN,CVL,,CA,051,61
MEDIA,MED,,CA,MINIST,GOV,,CA,090,39
SUPREME COURT,JUD,,CA,DOCTOR,HLH,,CA,060,31
POLICE,COP,,CA,TORONTO,,,CA,173,31
PUBLISHER,MED,,CA,BUSINESS,BUS,,CA,010,29
HOSPITAL,HLH,,CA,POLICE,COP,,CA,043,28
HOSPITAL,HLH,,CA,TORONTO,,,CA,043,26
POLICE,COP,,CA,HOSPITAL,HLH,,CA,042,26
PRIME MINISTER,GOV,,CA,GERMANY,,,FR,042,22"""
df = pd.read_csv(io.StringIO(t))
df
Out[244]:
Source Actor1Type1Code Actor1Type2Code Actor1Geo_CountryCode \
0 PRINCE GOV NaN CA
1 MEDIA MED NaN CA
2 SUPREME COURT JUD NaN CA
3 POLICE COP NaN CA
4 PUBLISHER MED NaN CA
5 HOSPITAL HLH NaN CA
6 HOSPITAL HLH NaN CA
7 POLICE COP NaN CA
8 PRIME MINISTER GOV NaN CA
Target Actor2Type1Code Actor2Type2Code Actor2Geo_CountryCode EventCode \
0 CITIZEN CVL NaN CA 51
1 MINIST GOV NaN CA 90
2 DOCTOR HLH NaN CA 60
3 TORONTO NaN NaN CA 173
4 BUSINESS BUS NaN CA 10
5 POLICE COP NaN CA 43
6 TORONTO NaN NaN CA 43
7 HOSPITAL HLH NaN CA 42
8 GERMANY NaN NaN FR 42
f0_
0 61
1 39
2 31
3 31
4 29
5 28
6 26
7 26
8 22
或传递param index_col=None
:
df = pd.DataFrame.from_csv(io.StringIO(t), index_col=None)
因此它不会将第一列解释为索引列