Coverting pd.DataFrame: Lists to strings

时间:2018-02-03 09:26:09

标签: python list pandas type-conversion newline

I currently have csv file that is structured as follows:

url,text
http://example1.com,"['Animal' 'Giraffe' 'Drawing' 'Font'
 'Graphics' 'Map' 'Paper'
 'Text' 'Writing']"
http://example2.com,"['Table' 'Chair' 'Long hair'
  'Selfie' 'Smile']"
...

Import:

df = pd.read_csv('data.csv', delimiter=',', lineterminator='\n')

Problem is: When importing the csv, the pandas DF imports \n for new lines, too:

df['text'][0]
"['Animal' 'Giraffe' 'Drawing' 'Font'\n 'Graphics' 'Map' 'Paper'\n 'Text' 'Writing']"

df['text'][1]
"['Table' 'Chair' 'Long hair'\n 'Selfie' 'Smile'"

For example, this is what I need in the end:

animal, giraffe, drawing, font, graphics, individual sport, laptop, map, paper, text, writing

I.e. a script that removes the newlines or imports correctly, and then converts the list into a clean string. This didn't work:

df['text'].apply(lambda x: ' '.join(x))
df['text']

2 个答案:

答案 0 :(得分:1)

I think you need strip, then split and last join:

df['new'] = df['text'].str.strip("[]'").str.split("'\s+'").str.join(', ')
print (df)

                   url                                               text  \
0  http://example1.com  ['Animal' 'Giraffe' 'Drawing' 'Font'\n 'Graphi...   
1  http://example2.com  ['Table' 'Chair' 'Long hair'\n  'Selfie' 'Smile']   

                                                 new  
0  Animal, Giraffe, Drawing, Font, Graphics, Map,...  
1             Table, Chair, Long hair, Selfie, Smile  

If you want column of lists:

df['new'] = df['text'].str.strip("[]'").str.split("'\s+'")
print (df)

                   url                                               text  \
0  http://example1.com  ['Animal' 'Giraffe' 'Drawing' 'Font'\n 'Graphi...   
1  http://example2.com  ['Table' 'Chair' 'Long hair'\n  'Selfie' 'Smile']   

                                                 new  
0  [Animal, Giraffe, Drawing, Font, Graphics, Map...  
1           [Table, Chair, Long hair, Selfie, Smile]  

答案 1 :(得分:1)

You can simply use the convertor keyword in read_csv function by first stripping the text column and then splitting with whitespace:

In [25]: df = pd.read_csv('test.csv', converters={'text': lambda x: x.strip('[]').split()})

In [26]: df
Out[26]: 
                   url                                               text
0  http://example1.com  ['Animal', 'Giraffe', 'Drawing', 'Font', 'Grap...
1  http://example2.com  ['Table', 'Chair', 'Long, hair', 'Selfie', 'Sm...

Note that if you had , in between your array items you could also use ast.literal_eval() directly on your column.(after removing the \n though!).