我有一个如下所示的数据框,并想根据B1-B4的数量创建一些行副本。另外,请依次在新的对应列中填写字母。
原始数据框:
B1 B2 B3 B4
0 1C
1 3A 1A
2 41A 28A 3A
3 42A 41A 28A 3A
B1 B2 B3 B4 B1_u B2_u B3_u B4_u
0 1C C
说明:
row 0
:在B1
仅具有1个值,因此在对应的列C
1C
的{{1}})
B1_u
说明:
B1 B2 B3 B4 B1_u B2_u B3_u B4_u
1 3A 1A A
2 3A 1A A
:这是2个值(row 1
,3A
),因此扩展2行并填写(1A
和A
的{{1}} )相应的列3A
,1A
依次
等
B1_u
答案 0 :(得分:0)
IIUC:这是一个解决方案:
首先让我们创建此问题所需的数据集:
import pandas as pd
import numpy as np
import string
#Code to generate data-set - not explained
df = pd.DataFrame(np.tril(np.random.randint(1, 100, (10,10))), columns=[f'B{x}' for x in range(1, 11)])
df = df.applymap(str)
df = df.replace('0', '')
pp = np.random.dirichlet(np.ones(26)*1000., size=1)[0]
cl = np.random.choice(list(string.ascii_uppercase), size=(10), p=pp)
for x in range(0, len(df)):
for y in range(0, x + 1):
df.iloc[x, y] = f'{df.iloc[x, y]}{cl[x]}'
#Solution code
#Create a dataframe to store the output from the columns of input dataframe and its index
dfo = pd.DataFrame(columns=[f'{x}_u' for x in df.columns], index=df.index)
#Count non empty values in all the rows
vc = df.apply(lambda x: np.count_nonzero(x), axis=1)
#NOTE: If you have a row that can have more than one attribute then you need to revisit your problem
#Irrespective how the data-set was generated the following code shall work
#Lets populate the output dataframe as required by running nested for loops
for i, v in enumerate(vc):
for j in range(0, v):
dfo.iloc[i, j] = list(df.iloc[i, j])[-1]
result = df.join(dfo)
输出结果如下: