在一列中的行中拆分值,同时复制其他列数据

时间:2018-04-09 19:36:40

标签: python pandas dataframe

我正在尝试将列的每一行中的值拆分为多行,同时复制其他列的相应值。我对python很新,并试图找到一种方法来实现这个解决方案到更大的数据集。

这是输入文件:

Name    Year    Subject                                        State
Jack    2003    Math, Sci, Music                               MA
Sam     2004    Math, PE, Language, Social                     CA
Nicole  2005    Math, Life Sci, Geography, Music, Computer Sci NY

这就是我想要的输出:

Name    Year    Subject            State
Jack    2003    Math               MA
Jack    2003    Sci                MA
Jack    2003    Music              MA
Sam     2004    Math               CA
Sam     2004    PE                 CA
Sam     2004    Language           CA
Sam     2004    Social             CA
Nicole  2005    Math               NY
Nicole  2005    Life Sci           NY
Nicole  2005    Geography          NY
Nicole  2005    Music              NY
Nicole  2005    Computer Sci       NY

我试过这段代码:

import pandas as pd 

df= pd.read_csv('C:/Users/3216140/Desktop/test.csv', delimiter=',', skiprows = 1, names = ["Name","Year","Subject","State"] ) 
print(df) 
sub = df['Subject'].str.split(',').apply(pd.Series, 1).stack() 
sub.index = sub.index.droplevel(-1) 
sub.name = 'Subject' 
print (sub) 
del df['Subject'] 
df.join(sub) 
print(df) 

但是加入似乎没有奏效。我只是输入了没有'Subject'作为输出的输入文件。

1 个答案:

答案 0 :(得分:1)

您可以在此处使用np.repeatitertools.chain

from itertools import chain

v = df.pop('Subject').str.split(r'\s*,\s*')
df_new = pd.DataFrame(
    df.values.repeat(v.str.len(), axis=0),
    columns=df.columns
)
df_new['Subject'] = list(itertools.chain.from_iterable(v))

df_new

      Name State Year       Subject
0     Jack  2003   MA          Math
1     Jack  2003   MA           Sci
2     Jack  2003   MA         Music
3      Sam  2004   CA          Math
4      Sam  2004   CA            PE
5      Sam  2004   CA      Language
6      Sam  2004   CA        Social
7   Nicole  2005   NY          Math
8   Nicole  2005   NY      Life Sci
9   Nicole  2005   NY     Geography
10  Nicole  2005   NY         Music
11  Nicole  2005   NY  Computer Sci