如何读取由空格分隔的文件和:

时间:2018-04-08 06:10:17

标签: python pandas csv

我的数据形式如下:

1 440:0.033906222568727 730:0.0424739279722748 1523:0.0773048148348295 1893:0.0433930684646909

1 271:0.0646290650479301 405:0.0653366028581683 584:0.0744087075001463 770:0.0717824200677465

1 577:0.0679078686536282 761:0.0506946081073312

-1 440:0.0437614564467411 798:0.0370070258333617 831:0.0549176430011721 1681:0.0715035548706038 1963:0.102891965918849 2667:0.0461603813033019 2899:0.0672807783934756

我希望以表格的形式输出:

1 440 0.033906222568727 ......
1 271 0.0646290650479301 ...... 
1 271 0.0646290650479301 ......
1 577 0.0679078686536282 .........

我尝试过使用

 x = pd.read_csv('rcv1_train.binary', sep = "\s+|:",  engine = 'python')

并收到错误:

  

pandas.errors.ParserError:第134行预期413个字段,见419.错误可能是由于在使用多字符分隔符时忽略了引号。

1 个答案:

答案 0 :(得分:1)

您可能在第134行中包含错误数据

尝试使用<div class="wrapper"> <form action='/results' method = 'post'> <p> <label for="Name"> Your Name: </label> <input type="text" id="Name" name="Name"> <input type='submit' value='submit'> </p> <p> <select name = city > <option value="SanJose" name = "city1">San Jose</option> <option value="Seattle" name = "city2">Seattle</option> <option value="LA" name = "city3">LA</option> </select> </p>

error_bad_lines=False