Question

我有一个pandas dataframe列，这是一个系列。该列包含作为字符串列表的元素。但是这个列基本上是一个postgressql的array_agg，所以每个元素都是一个列表，但是像这样：

<type 'list'>

以下是此列（系列）的前两个元素的外观

0    [UMIN Clinical Trial Registry [Website Last up...
1    [Disposition of Patients \n\nSTARTED; Tetracai...
Name: notes, dtype: object

当我做专栏[0]时，我得到了这个：

['UMIN Clinical Trial Registry [Website Last updated date: May 26, 2011] \n\nRecruitment status: Not yet recruiting \n\nDate of protocol fixation: 02/01/2011 \n\nAnticipated trial start date: 07/01/2011 \n\nName of primary sponsor: The Second Department of Internal Medicine Tokyo Medical University \n\nSource of funding: OMRON Health Care corporation \n\nhttps://upload.umin.ac.jp/cgi-open-bin/ctr/ctr.cgi?function=brows&action=brows&type=summary&recptno=R000006682&language=E', 'The projected start date 07/01/2011 was removed because that date passed without evidence of trial start.\n\nhttps://upload.umin.ac.jp/cgi-open-bin/ctr/ctr.cgi?function=brows&action=brows&type=summary&recptno=R000006682&language=E']

如果您看到此列的每个元素都是字符串列表。我想得到一个最后一列，而不是每个元素都是一个字符串列表，它应该组合列表中的所有字符串并作为字符串给出。

问题是list元素本身是一个字符串，因为它是使用array_agg创建的。所以它不是我可以使用的迭代＆＃34; ＆＃34;。加入（列[0]）。给出一个错误，列[0]不是列表，而是类型＆＃39; list＆＃39;

如何克服这个问题？

编辑：

 If I do this: 

for x in column: 
   s=" ".join(x) 
   docs.append(s) 
   break

它有效。但是如果我想在没有break语句的情况下为所有人做这件事，那就会抛出一个错误：

for x in column:
   s=" ".join(x) 
   docs.append(s)

错误：

<ipython-input-154-556942a06d81> in <module>() 1 for x in trials_notes.notes: ----> 2 s=" ".join(x) 3 docs.append(s) 4 TypeError: can only join an iterable –

Answer 1

您可以使用Series.str.join()并将分隔符作为参数加入。示例 -

newcol = column.str.join（＆＃39;＆＃39;）

演示 -

In [3]: import pandas as pd

In [4]: column = pd.Series([['blah1'],['blah2'],['blah123']],name='blah')

In [5]: column.str.join(' ')
Out[5]:
0      blah1
1      blah2
2    blah123
Name: blah, dtype: object

In [7]: type(column[0])
Out[7]: list

加入系列中列表的元素

1 个答案: