我的嵌套列表如下:-
list = [['A:1','B:(null)','C:3','D:4'],
['A:1','B:abc','C:6','D:7'],
['A:1','B:def','C:2','G:44','E: 600','F: 6600'],
['A:1','B:ghi','C:33','D:44']]
我想将其转换为数据帧,以使before :
成为column name
,而after :
成为值
这里我有两种类型的数据,一种是:-
[['A:1','B:(null)','C:3','D:4'],
['A:1','B:abc','C:6','D:7'],
['A:1','B:ghi','C:33','D:44']]
与一项不同的是
['A:1','B:def','C:2','G:44','E: 600','F: 6600']
预期输出:-
df1 =
和df2 =
Q.1)到目前为止,我只有两种数据,所以我需要两个数据帧。 问题2)我们可以使其动态吗,以便根据列表中的项目创建多个df。
答案 0 :(得分:2)
IIUC,第1次将您的list
转换为list
的{{1}}(也不要将您的列表命名为list,它将覆盖python函数),第2次使用{ {1}}与dict
一起创建组密钥,然后构建字典,我不建议动态创建数据框,可以将它们放入isnull
中,如果需要的话请看{{1 }}
dot
dict
不推荐local
[dict(tuple(y.split(":")) for y in x )for x in l] # make you list to list of dict
Out[11]:
[{'A': '1', 'B': '(null)', 'C': '3', 'D': '4'},
{'A': '1', 'B': 'abc', 'C': '6', 'D': '7'},
{'A': '1', 'B': 'def', 'C': '2', 'E': ' 600', 'F': ' 6600', 'G': '44'},
{'A': '1', 'B': 'ghi', 'C': '33', 'D': '44'}]
newl=[dict(tuple(y.split(":")) for y in x )for x in l]
pd.DataFrame(newl)
Out[13]:
A B C D E F G
0 1 (null) 3 4 NaN NaN NaN
1 1 abc 6 7 NaN NaN NaN
2 1 def 2 NaN 600 6600 44
3 1 ghi 33 44 NaN NaN NaN
newdf=pd.DataFrame(newl)
s=newdf.isnull().dot(newdf.columns)# using dot create the groupby key
s
Out[16]:
0 EFG
1 EFG
2 D
3 EFG
dtype: object
答案 1 :(得分:1)
您可以:
"(null)"
替换为None
)dict
通过排序键将collections.defaultdict
分组from collections import defaultdict
import pandas as pd
# convert to dictionaries
def makeDict(inner):
return {k: (v if v!= "(null)" else None) for k,v in (p.split(":") for p in inner)}
# group and yield dfs
def makeIt(l):
# collect data as dicts
dicts = []
for inner in l:
dicts.append( makeDict(inner))
# group by sorted keys
t = defaultdict(list)
for d in dicts:
t[tuple(sorted(d.keys()))].append(d)
# create dataframes from groups and yield them
for k in t:
df = pd.DataFrame(t[k])
yield df
用法:
l = [['A:1','B:(null)','C:3','D:4'],
['A:1','B:abc','C:6','D:7'],
['A:1','B:def','C:2','G:44','E: 600','F: 6600'],
['A:1','B:ghi','C:33','D:44']]
dfs = list(makeIt(l))
for df in dfs:
print("-"*20)
print(df)
输出:
--------------------
A B C D
0 1 None 3 4
1 1 abc 6 7
2 1 ghi 33 44
--------------------
A B C E F G
0 1 def 2 600 6600 44