过滤嵌套列表

时间:2018-11-19 19:43:20

标签: python python-3.x pandas dataframe lambda

我的嵌套列表如下:-

 list = [['A:1','B:(null)','C:3','D:4'],
        ['A:1','B:abc','C:6','D:7'],
        ['A:1','B:def','C:2','G:44','E: 600','F: 6600'],
        ['A:1','B:ghi','C:33','D:44']]

我想将其转换为数据帧,以使before :成为column name,而after :成为值

这里我有两种类型的数据,一种是:-

[['A:1','B:(null)','C:3','D:4'],
  ['A:1','B:abc','C:6','D:7'],
  ['A:1','B:ghi','C:33','D:44']]

与一项不同的是

['A:1','B:def','C:2','G:44','E: 600','F: 6600']

预期输出:-

df1 =

enter image description here

和df2 =

enter image description here

Q.1)到目前为止,我只有两种数据,所以我需要两个数据帧。 问题2)我们可以使其动态吗,以便根据列表中的项目创建多个df。

2 个答案:

答案 0 :(得分:2)

IIUC,第1次将您的list转换为list的{​​{1}}(也不要将您的列表命名为list,它将覆盖python函数),第2次使用{ {1}}与dict一起创建组密钥,然后构建字典,我不建议动态创建数据框,可以将它们放入isnull中,如果需要的话请看{{1 }}

dot

dict

不推荐local

[dict(tuple(y.split(":")) for y in x )for x in l] # make you list to list of dict 
Out[11]: 
[{'A': '1', 'B': '(null)', 'C': '3', 'D': '4'},
 {'A': '1', 'B': 'abc', 'C': '6', 'D': '7'},
 {'A': '1', 'B': 'def', 'C': '2', 'E': ' 600', 'F': ' 6600', 'G': '44'},
 {'A': '1', 'B': 'ghi', 'C': '33', 'D': '44'}]
newl=[dict(tuple(y.split(":")) for y in x )for x in l]
pd.DataFrame(newl)
Out[13]: 
   A       B   C    D     E      F    G
0  1  (null)   3    4   NaN    NaN  NaN
1  1     abc   6    7   NaN    NaN  NaN
2  1     def   2  NaN   600   6600   44
3  1     ghi  33   44   NaN    NaN  NaN
newdf=pd.DataFrame(newl)
s=newdf.isnull().dot(newdf.columns)# using dot create the groupby key 
s
Out[16]: 
0    EFG
1    EFG
2      D
3    EFG
dtype: object

答案 1 :(得分:1)

您可以:

  1. 从您的列表中创建字典(我选择将"(null)"替换为None
  2. 通过dict通过排序键将collections.defaultdict分组
  3. 从组中创建并产生数据框

from collections import defaultdict
import pandas as pd

# convert to dictionaries        
def makeDict(inner): 
    return {k: (v if v!= "(null)" else None) for k,v in (p.split(":") for p in inner)}

# group and yield dfs
def makeIt(l):
    # collect data as dicts
    dicts = []
    for inner in l:
        dicts.append( makeDict(inner))

    # group by sorted keys
    t = defaultdict(list)
    for d in dicts:
        t[tuple(sorted(d.keys()))].append(d)

    # create dataframes from groups and yield them
    for k in t:
        df = pd.DataFrame(t[k])
        yield df

用法:

l = [['A:1','B:(null)','C:3','D:4'],
     ['A:1','B:abc','C:6','D:7'],
     ['A:1','B:def','C:2','G:44','E: 600','F: 6600'],
     ['A:1','B:ghi','C:33','D:44']]

dfs = list(makeIt(l))

for df in dfs:
    print("-"*20)
    print(df)

输出:

--------------------
   A     B   C   D
0  1  None   3   4
1  1   abc   6   7
2  1   ghi  33  44

--------------------
   A    B  C     E      F   G
0  1  def  2   600   6600  44