熊猫/ dask csv多行读取

时间:2020-08-09 05:31:08

标签: python pandas csv dask

我以这种方式使用CSV:

name,sku,description
Bryce Jones,lay-raise-best-end,"Art community floor adult your single type. Per back community former stock thing."
John Robinson,cup-return-guess,Produce successful hot tree past action young song. Himself then tax eye little last state vote. Country down list that speech economy leave.
Theresa Taylor,step-onto,"**Choice should lead budget task. Author best mention.
Often stuff professional today allow after door instead. Model seat fear evidence. Now sing opportunity feeling no season show.**"

整个多行是第三行的描述列的值

但是什么时候

df = ddf.read_csv(
    file_path,blocksize=2000,engine="python",encoding='utf-8-sig',quotechar='"',delimiter='[,]',quoting=csv.QUOTE_MINIMAL
)

我使用上面的代码以这种方式读取

['Bryce Jones', 'lay-raise-best-end', '"Art community floor adult your single type. Per back community former stock thing."']
['John Robinson', 'cup-return-guess', 'Produce successful hot tree past action young song. Himself then tax eye little last state vote. Country down list that speech economy leave.']
['Theresa Taylor', 'step-onto', '"Choice should lead budget task. Author best mention.']
['Often stuff professional today allow after door instead. Model seat fear evidence. Now sing opportunity feeling no season show."', None, None]

该怎么做?

1 个答案:

答案 0 :(得分:0)

1

您可以在文本行之间使用双换行符,在文本内部使用单换行符,pandas可以理解。因此,csv将是-

name,sku,description

Bryce Jones,lay-raise-best-end,"Art community floor adult your single type. Per back community former stock thing."

John Robinson,cup-return-guess,Produce successful hot tree past action young song. Himself then tax eye little last state vote. Country down list that speech economy leave.

Theresa Taylor,step-onto,"Choice should lead budget task. Author best mention.
Often stuff professional today allow after door instead. Model seat fear evidence. Now sing opportunity feeling no season show."

这是您的阅读方式。

df = pd.read_csv(filepath) # you can keep other parameters if you want

输出是

             name                 sku  \
0     Bryce Jones  lay-raise-best-end   
1   John Robinson    cup-return-guess   
2  Theresa Taylor           step-onto   

                                         description  
0  Art community floor adult your single type. Pe...  
1  Produce successful hot tree past action young ...  
2  Choice should lead budget task. Author best me...  

2

在需要换行的地方使用\n

name,sku,description
Bryce Jones,lay-raise-best-end,"Art community floor adult your single type. Per back community former stock thing."
John Robinson,cup-return-guess,Produce successful hot tree past action young song. Himself then tax eye little last state vote. Country down list that speech economy leave.
Theresa Taylor,step-onto,"Choice should lead budget task. Author best mention.\nOften stuff professional today allow after door instead. Model seat fear evidence. Now sing opportunity feeling no season show."

在阅读时,请使用codecs的python库。

import codecs
df = pd.read_csv('../../data/stack.csv')
print(codecs.decode(df.iloc[2,2], 'unicode_escape'))

输出:

Choice should lead budget task. Author best mention.
Often stuff professional today allow after door instead. Model seat fear evidence. Now sing opportunity feeling no season show.

我们必须使用codecs.decode(),因为pandas\转义了字符\\。并解码撤消该操作。如果没有print()函数,您将看不到换行符。