我正在尝试将列的每一行中的值拆分为多行,同时复制其他列的相应值。我对python很新,并试图找到一种方法来实现这个解决方案到更大的数据集。
这是输入文件:
Name Year Subject State
Jack 2003 Math, Sci, Music MA
Sam 2004 Math, PE, Language, Social CA
Nicole 2005 Math, Life Sci, Geography, Music, Computer Sci NY
这就是我想要的输出:
Name Year Subject State
Jack 2003 Math MA
Jack 2003 Sci MA
Jack 2003 Music MA
Sam 2004 Math CA
Sam 2004 PE CA
Sam 2004 Language CA
Sam 2004 Social CA
Nicole 2005 Math NY
Nicole 2005 Life Sci NY
Nicole 2005 Geography NY
Nicole 2005 Music NY
Nicole 2005 Computer Sci NY
我试过这段代码:
import pandas as pd
df= pd.read_csv('C:/Users/3216140/Desktop/test.csv', delimiter=',', skiprows = 1, names = ["Name","Year","Subject","State"] )
print(df)
sub = df['Subject'].str.split(',').apply(pd.Series, 1).stack()
sub.index = sub.index.droplevel(-1)
sub.name = 'Subject'
print (sub)
del df['Subject']
df.join(sub)
print(df)
但是加入似乎没有奏效。我只是输入了没有'Subject'作为输出的输入文件。
答案 0 :(得分:1)
您可以在此处使用np.repeat
和itertools.chain
。
from itertools import chain
v = df.pop('Subject').str.split(r'\s*,\s*')
df_new = pd.DataFrame(
df.values.repeat(v.str.len(), axis=0),
columns=df.columns
)
df_new['Subject'] = list(itertools.chain.from_iterable(v))
df_new
Name State Year Subject
0 Jack 2003 MA Math
1 Jack 2003 MA Sci
2 Jack 2003 MA Music
3 Sam 2004 CA Math
4 Sam 2004 CA PE
5 Sam 2004 CA Language
6 Sam 2004 CA Social
7 Nicole 2005 NY Math
8 Nicole 2005 NY Life Sci
9 Nicole 2005 NY Geography
10 Nicole 2005 NY Music
11 Nicole 2005 NY Computer Sci