I currently have csv file that is structured as follows:
url,text
http://example1.com,"['Animal' 'Giraffe' 'Drawing' 'Font'
'Graphics' 'Map' 'Paper'
'Text' 'Writing']"
http://example2.com,"['Table' 'Chair' 'Long hair'
'Selfie' 'Smile']"
...
Import:
df = pd.read_csv('data.csv', delimiter=',', lineterminator='\n')
Problem is: When importing the csv, the pandas DF imports \n for new lines, too:
df['text'][0]
"['Animal' 'Giraffe' 'Drawing' 'Font'\n 'Graphics' 'Map' 'Paper'\n 'Text' 'Writing']"
df['text'][1]
"['Table' 'Chair' 'Long hair'\n 'Selfie' 'Smile'"
For example, this is what I need in the end:
animal, giraffe, drawing, font, graphics, individual sport, laptop, map, paper, text, writing
I.e. a script that removes the newlines or imports correctly, and then converts the list into a clean string. This didn't work:
df['text'].apply(lambda x: ' '.join(x))
df['text']
答案 0 :(得分:1)
I think you need strip
, then split
and last join
:
df['new'] = df['text'].str.strip("[]'").str.split("'\s+'").str.join(', ')
print (df)
url text \
0 http://example1.com ['Animal' 'Giraffe' 'Drawing' 'Font'\n 'Graphi...
1 http://example2.com ['Table' 'Chair' 'Long hair'\n 'Selfie' 'Smile']
new
0 Animal, Giraffe, Drawing, Font, Graphics, Map,...
1 Table, Chair, Long hair, Selfie, Smile
If you want column of list
s:
df['new'] = df['text'].str.strip("[]'").str.split("'\s+'")
print (df)
url text \
0 http://example1.com ['Animal' 'Giraffe' 'Drawing' 'Font'\n 'Graphi...
1 http://example2.com ['Table' 'Chair' 'Long hair'\n 'Selfie' 'Smile']
new
0 [Animal, Giraffe, Drawing, Font, Graphics, Map...
1 [Table, Chair, Long hair, Selfie, Smile]
答案 1 :(得分:1)
You can simply use the convertor
keyword in read_csv
function by first stripping the text
column and then splitting with whitespace:
In [25]: df = pd.read_csv('test.csv', converters={'text': lambda x: x.strip('[]').split()})
In [26]: df
Out[26]:
url text
0 http://example1.com ['Animal', 'Giraffe', 'Drawing', 'Font', 'Grap...
1 http://example2.com ['Table', 'Chair', 'Long, hair', 'Selfie', 'Sm...
Note that if you had ,
in between your array items you could also use ast.literal_eval()
directly on your column.(after removing the \n
though!).