请考虑在具有标题和一行作为输入的DataFrame的下面。输入行需要根据值之间的间隔分为两行,或者第二个值需要进入第二行。因此在输出中,我们需要两行..
Input1:
Age Gender Coverage-Status Total-Paid Benefit-Date Outstanding-Reserve Waiver-Reserve Coverage-Code
31 26 M F AC CC 10,000 2/15/2011 NaN 4,743 081 010
需要的输出:
Age Gender Coverage-Status Total-Paid Benefit-Date Outstanding-Reserve Waiver-Reserve Coverage Code
31 M AC 10,000 2/15/2011 NaN 4,743 081
26 F CC 010
我现在被困在这个地方,这可能吗?
我正在尝试这样的事情:
ad['Age'] = ad.Age.str.split(expand = True).stack()
但是这个似乎不起作用...
答案 0 :(得分:0)
抱歉,没有足够的意见要发表。您可以先按列拆分数据框,然后拆分列重复的行,然后重新加入保存的数据框,即
input1_a = input1[['Total-Paid', 'Benefit-Date', 'Outstanding-Reserve', 'Waiver-Reserve']].copy()
input1_b = input1[['Age', 'Gender' ,'Coverage-Status','Coverage-Code']].copy()
Yatu放弃了上一个答案,但是这里有一个可以拆分多列的函数。邹伟林于2018年9月6日发表评论。不如Yatu的解决方案优雅,但...
https://gist.github.com/jlln/338b4b0b55bd6984f883
然后
input1_new = pd.concat([input1_a,input1_b], axis = 1)
然后重新排序列
input1_new = input1_new[['Age', 'Gender' ,'Coverage-Status','Total-Paid', 'Benefit-Date', 'Outstanding-Reserve', 'Waiver-Reserve', 'Coverage-Code']]
答案 1 :(得分:0)
尝试:
# iterate ONLY over columns subjected to split - I assumed it's all columns
for col in df.columns:
df[col]=df[col].str.split(" ")
res = df.stack().explode().reset_index(level=0, drop=True).to_frame()
res["id"] = res.groupby(level=0).cumcount()
res = res.set_index("id", append=True).unstack(level=0)
res.columns = res.columns.droplevel()
输出:
Age Benefit-Date Coverage-Code Coverage-Status Gender Outstanding-Reserve \
id
0 31 2/15/2011 081 AC M NaN
1 26 NaN 010 CC F NaN
Total-Paid Waiver-Reserve
id
0 10,000 4,743
1 NaN NaN