Question

我正在尝试将多列转换为多行。有人可以提供一些建议吗？

我有DataFrame：

id .        values
1,2,3,4     [('a','b'), ('as','bd'),'|',('ss','dd'), ('ws','ee'),'|',('rr','rt'), ('tt','yy'),'|',('yu','uu'), ('ii','oo')]

我需要它看起来像这样：

ID       Values
1         ('a','b'), ('as','bd')
2         ('ss','dd'), ('ws','ee')
3         ('rr','rt'), ('tt','yy')
4         ('yu','uu'), ('ii','oo')

我已经尝试过groupby，split，izip。也许我做的方法不正确？

Answer 1

我做了一个简单又肮脏的示例，您如何解析此数据框

# example dataframe
df = [
    "1,2,3,4",
    [('a','b'), ('as','bd'), '|', ('ss','dd'), ('ws','ee'), '|', ('rr','rt'), ('tt','yy'), '|', ('yu','uu'), ('ii','oo')]
]

# split ids by comma
ids = df[0].split(",")

# init Id and Items as int and dict()
Id = 0
Items = dict()

# prepare array for data insert
for i in ids:
    Items[i] = []

# insert data
for i in df[1]:
    if isinstance(i, (tuple)):
        Items[ids[Id]].append(i)
    elif isinstance(i, (str)):
        Id += 1

# print data as written in stackoverflow question
print("id .\tvalues")
for item in Items:
    print("{}\t{}".format(item, Items[item]))

Answer 2

我想出了一个基于多级分组的简洁解决方案，我认为这在很大程度上是 pandasonic

。

从定义以下功能开始，“分割” 系列从各个 values 元素到一系列列表表示形式，不包含 [和] 。拆分发生在每个'|'元素。：

def fn(grp1):
    grp2 = (grp1 == '|').cumsum()
    return grp1[grp1 != '|'].groupby(grp2).apply(lambda x: repr(list(x))[1:-1])

（稍后使用）。

处理的第一步是将 id 列转换为 Series ：

sId = df.id.apply(lambda x: pd.Series(x.split(','))).stack().rename('ID')

对于您的数据，结果是：

0  0    1
   1    2
   2    3
   3    4
Name: ID, dtype: object

MultiIndex的第一层是源行的索引，第二层是源行的索引级别是连续数字（在当前行内）。

现在是时候对 values 列执行类似的转换了：

sVal = pd.DataFrame(df['values'].values.tolist(), index= df.index)\
    .stack().groupby(level=0).apply(fn).rename('Values')

结果是：

0  0      ('a', 'b'), ('as', 'bd')
   1    ('ss', 'dd'), ('ws', 'ee')
   2    ('rr', 'rt'), ('tt', 'yy')
   3    ('yu', 'uu'), ('ii', 'oo')
Name: Values, dtype: object

请注意，上面的MultiIndex具有与 sId 相同的结构。

最后一步是 concat 这两个部分结果：

result = pd.concat([sId, sVal], axis=1).reset_index(drop=True)

结果是：

  ID                      Values
0  1    ('a', 'b'), ('as', 'bd')
1  2  ('ss', 'dd'), ('ws', 'ee')
2  3  ('rr', 'rt'), ('tt', 'yy')
3  4  ('yu', 'uu'), ('ii', 'oo')

如何将DataFrame列拆分为多行？

2 个答案: