将单元格从2列拆分为行

时间:2018-01-16 04:36:47

标签: python pandas numpy

我有一个数据集,如下所示

debugger

var app = Application.currentApplication();
app.includeStandardAdditions = true;

var source = "/documents/John's Spreadsheet.xls";
var target = "/documents/John's Spreadsheet.csv";

source = source.replace("'", "'\"'\"'", "g");
target = target.replace("'", "'\"'\"'", "g");

var exportScript = "/excel -i";
exportScript += " '" + source + "'";
exportScript += " '" + target + "'";
exportScript += ";true";

try {
    app.doShellScript(exportScript);
}
catch (error) {
    console.log(error.message);
}

我正在尝试解析此数据,并在出现特定分隔符时将其拆分为新行。这些分隔符是'〜'。即将事件和日期列拆分为新列名称为Eventsplit和day split

我已经完成了以下代码但是我不知道如何一次性完成这两个列可以任何身体帮助。

这是方法号。 1我试过

Cus_ID    Event                              Day
  1       Event1~Event2~Event3~Event4        1~1~1~1
  2       Event3~Event4~Event5~Event6        1~2~3~4

the output i'm trying to get would be: 

 Cus_ID |             Event          |  Day    |  EventSplit|Day split
----------------------------------------------------------------------------
1       | Event1~Event2~Event3~Event4| 1~1~1~1 | Event1 |1
1       | Event1~Event2~Event3~Event4| 1~1~1~1 | Event2 |1
1       | Event1~Event2~Event3~Event4| 1~1~1~1 | Event3 |1
1       | Event1~Event2~Event3~Event4| 1~1~1~1 | Event4 |1
2       | Event3~Event4~Event5~Event6| 1~2~3~4 | Event3 |1
2       | Event3~Event4~Event5~Event6| 1~2~3~4 | Event4 |2
2       | Event3~Event4~Event5~Event6| 1~2~3~4 | Event5 |3
2       | Event3~Event4~Event5~Event6| 1~2~3~4 | Event6 |4

方法no_2尝试

import pandas as pd
import numpy as np
data =pd.read_csv("SeqData.csv")
def pre(data, c):
    event_col = data[c].str.split('~')
    clst = event.values.tolist()
    lens = [len(l) for l in clst]

    EventSplit = pd.DataFrame({c: np.concatenate(clst)}, data.index.repeat(lens))
    return data.drop(c, 1).join(EventSplit ).reset_index(drop=True)

Data_df = pre(data, 'Event')

1 个答案:

答案 0 :(得分:0)

这是不必要的,但有点不同,因为你需要同时不需要两列

df1=df.copy()
df.Event=df.Event.str.split('~')
df.Day=df.Day.str.split('~')
Tdf=pd.DataFrame({'Cus_ID':df['Cus_ID'].repeat(df.Event.str.len()),'EventSplit':np.concatenate(df.Event.tolist()),'DaySplit':np.concatenate(df.Day.tolist()),}).merge(df1,on='Cus_ID')

Tdf
Out[661]: 
   Cus_ID DaySplit EventSplit                        Event      Day
0       1        1     Event1  Event1~Event2~Event3~Event4  1~1~1~1
1       1        1     Event2  Event1~Event2~Event3~Event4  1~1~1~1
2       1        1     Event3  Event1~Event2~Event3~Event4  1~1~1~1
3       1        1     Event4  Event1~Event2~Event3~Event4  1~1~1~1
4       2        1     Event3  Event3~Event4~Event5~Event6  1~2~3~4
5       2        2     Event4  Event3~Event4~Event5~Event6  1~2~3~4
6       2        3     Event5  Event3~Event4~Event5~Event6  1~2~3~4
7       2        4     Event6  Event3~Event4~Event5~Event6  1~2~3~4