将一列CSV列表简化为一个列表

时间:2018-11-27 18:07:52

标签: python pandas flatten

我正在使用Python3从Excel电子表格中读取一列:

import pandas as pd
from pandas import ExcelFile
df = pd.read_excel('MWE.xlsx', sheet_name='Sheet1')
print(df)

                   col1                        col2
0         starts normal                  egg, bacon
1  still none the wiser         egg, sausage, bacon
2      maybe odd tastes                   egg, spam
3     or maybe post-war            egg, bacon, spam
4  maybe for the hungry   egg, bacon, sausage, spam
5                 bingo  spam, bacon, sausage, spam

我想将col2简化为col2中单个单词的列表(例如egg,腊肉,...)。

df.col2.ravel()似乎将col2简化为字符串列表。

df.col2.flatten()产生

AttributeError: 'Series' object has no attribute 'flatten' 

3 个答案:

答案 0 :(得分:2)

如果您想要将一系列列表作为col2,则可以实现此目的:

service.createStatusOptions()

结果:

df = pd.DataFrame({'col1': ['starts normal','still none the wiser'], 'col2': ['egg, bacon','egg, sausage, bacon']})

df['col2'] = df['col2'].map(lambda x: [i.strip() for i in x.split(',')])
print(df)

答案 1 :(得分:1)

尝试简单的方法,例如:

Document document = readResponse(pageId, postId);
Integer theObject = getObjectFromDocument(document);
Integer theComments = getCommentsFromDocument(document);

答案 2 :(得分:1)

也许这就是您需要的:

  1. 将一系列用逗号分隔的字符串转换为列表列表

    arrs = df.col2.map(lambda x: [i.strip() for i in x.split(',')]).tolist()
    # [['egg', 'bacon'], ['egg', 'sausage', 'bacon'], ...]
    
  2. 获取包含唯一项的列表

    unique = list({elem for arr in arrs for elem in arr})
    # ['spam', 'sausage', 'egg', 'bacon']