我有一个pandas dataframe列,这是一个系列。该列包含作为字符串列表的元素。但是这个列基本上是一个postgressql的array_agg,所以每个元素都是一个列表,但是像这样:
<type 'list'>
以下是此列(系列)的前两个元素的外观
0 [UMIN Clinical Trial Registry [Website Last up...
1 [Disposition of Patients \n\nSTARTED; Tetracai...
Name: notes, dtype: object
当我做专栏[0]时,我得到了这个:
['UMIN Clinical Trial Registry [Website Last updated date: May 26, 2011] \n\nRecruitment status: Not yet recruiting \n\nDate of protocol fixation: 02/01/2011 \n\nAnticipated trial start date: 07/01/2011 \n\nName of primary sponsor: The Second Department of Internal Medicine Tokyo Medical University \n\nSource of funding: OMRON Health Care corporation \n\nhttps://upload.umin.ac.jp/cgi-open-bin/ctr/ctr.cgi?function=brows&action=brows&type=summary&recptno=R000006682&language=E', 'The projected start date 07/01/2011 was removed because that date passed without evidence of trial start.\n\nhttps://upload.umin.ac.jp/cgi-open-bin/ctr/ctr.cgi?function=brows&action=brows&type=summary&recptno=R000006682&language=E']
如果您看到此列的每个元素都是字符串列表。我想得到一个最后一列,而不是每个元素都是一个字符串列表,它应该组合列表中的所有字符串并作为字符串给出。
问题是list元素本身是一个字符串,因为它是使用array_agg创建的。所以它不是我可以使用的迭代&#34; &#34;。加入(列[0])。给出一个错误,列[0]不是列表,而是类型&#39; list&#39;
如何克服这个问题?
编辑:
If I do this:
for x in column:
s=" ".join(x)
docs.append(s)
break
它有效。但是如果我想在没有break语句的情况下为所有人做这件事,那就会抛出一个错误:
for x in column:
s=" ".join(x)
docs.append(s)
错误:
<ipython-input-154-556942a06d81> in <module>() 1 for x in trials_notes.notes: ----> 2 s=" ".join(x) 3 docs.append(s) 4 TypeError: can only join an iterable –
答案 0 :(得分:2)
您可以使用Series.str.join()
并将分隔符作为参数加入。示例 -
newcol = column.str.join(&#39;&#39;)
演示 -
In [3]: import pandas as pd
In [4]: column = pd.Series([['blah1'],['blah2'],['blah123']],name='blah')
In [5]: column.str.join(' ')
Out[5]:
0 blah1
1 blah2
2 blah123
Name: blah, dtype: object
In [7]: type(column[0])
Out[7]: list