我有一个pandas数据框列,看起来有点像:
Out[67]:
0 ["cheese", "milk...
1 ["yogurt", "cheese...
2 ["cheese", "cream"...
3 ["milk", "cheese"...
现在,最终我想把它作为一个单一的列表,但是在尝试压扁这个时,我注意到pandas将["cheese", "milk", "cream"]
视为str
而不是list
我将如何解决这个问题,以便最终:
["cheese", "milk", "yogurt", "cheese", "cheese"...]
[编辑] 所以下面给出的答案似乎是:
s = pd.Series(["['cheese', 'milk']", "['yogurt', 'cheese']", "['cheese', 'cream']"])
s = s.str.strip("[]")
df = s.str.split(',', expand=True)
df = df.applymap(lambda x: x.replace("'", '').strip())
l = df.values.flatten()
print (l.tolist())
哪个很棒,问题得到解答,答案被接受了,但这对我来说是一个相当不优雅的解决方案。
答案 0 :(得分:2)
您可以使用numpy.flatten
然后展平嵌套lists
- see:
print df
a
0 [cheese, milk]
1 [yogurt, cheese]
2 [cheese, cream]
print df.a.values
[[['cheese', 'milk']]
[['yogurt', 'cheese']]
[['cheese', 'cream']]]
l = df.a.values.flatten()
print l
[['cheese', 'milk'] ['yogurt', 'cheese'] ['cheese', 'cream']]
print [item for sublist in l for item in sublist]
['cheese', 'milk', 'yogurt', 'cheese', 'cheese', 'cream']
编辑:
您可以尝试:
import pandas as pd
s = pd.Series(["['cheese', 'milk']", "['yogurt', 'cheese']", "['cheese', 'cream']"])
#remove []
s = s.str.strip('[]')
print s
0 'cheese', 'milk'
1 'yogurt', 'cheese'
2 'cheese', 'cream'
dtype: object
df = s.str.split(',', expand=True)
#remove ' and strip empty string
df = df.applymap(lambda x: x.replace("'", '').strip())
print df
0 1
0 cheese milk
1 yogurt cheese
2 cheese cream
l = df.values.flatten()
print l.tolist()
['cheese', 'milk', 'yogurt', 'cheese', 'cheese', 'cream']
答案 1 :(得分:0)
要将列值从str转换为列表,您可以使用df.columnName.tolist()
,为了展平您可以df.columnName.values.flatten()
答案 2 :(得分:0)
您可以将Series
转换为DataFrame
,然后致电stack
:
s.apply(pd.Series).stack().tolist()