我在pandas数据框中有字段,如下面的示例数据。其中一个字段中的值是具有/ count(某物)形式的分数。我想分割下面的示例输出值,并创建新记录。基本上是分子和分母。有些值甚至有多个/,比如count(something)/ count(thing)/ count(dog)。所以我想把这个值分成3条记录。任何有关如何做到这一点的提示将不胜感激。
Sample Data:
SampleDf=pd.DataFrame([['tom','sum(stuff)/count(things)'],['bob','count(things)/count(stuff)']],columns=['ReportField','OtherField'])
Example Output:
OutputDf=pd.DataFrame([['tom1','sum(stuff)'],['tom2','count(things)'],['bob1','count(things)'],['bob2','count(stuff)']],columns=['ReportField','OtherField'])
答案 0 :(得分:0)
可能有更好的方法,但试试这个,
df = df.set_index('ReportField')
df = pd.DataFrame(df.OtherField.str.split('/', expand = True).stack().reset_index(-1, drop = True)).reset_index()
你得到了
ReportField 0
0 tom sum(stuff)
1 tom count(things)
2 bob count(things)
3 bob count(stuff)
答案 1 :(得分:0)
一种可能的方式如下:
# split and stack
new_df = pd.DataFrame(SampleDf.OtherField.str.split('/').tolist(), index=SampleDf.ReportField).stack().reset_index()
print(new_df)
输出:
ReportField level_1 0
0 tom 0 sum(stuff)
1 tom 1 count(things)
2 bob 0 count(things)
3 bob 1 count(stuff)
现在,将ReportField
与level_1
合并:
# combine strings for tom1, tom2 ,.....
new_df['ReportField'] = new_df.ReportField.str.cat((new_df.level_1+1).astype(str))
# remove level column
del new_df['level_1']
# rename columns
new_df.columns = ['ReportField', 'OtherField']
print (new_df)
输出:
ReportField OtherField
0 tom1 sum(stuff)
1 tom2 count(things)
2 bob1 count(things)
3 bob2 count(stuff)
答案 2 :(得分:0)
您可以使用:
split
与expand=True
新DataFrame
stack
和reset_index
ReportField
列,并astype
str
drop
level_1
OutputDf = SampleDf.set_index('ReportField')['OtherField'].str.split('/',expand=True)
.stack().reset_index(name='OtherField')
OutputDf['ReportField'] = OutputDf['ReportField'] + OutputDf['level_1'].add(1).astype(str)
OutputDf = OutputDf.drop('level_1', axis=1)
print (OutputDf)
ReportField OtherField
0 tom1 sum(stuff)
1 tom2 count(things)
2 bob1 count(things)
3 bob2 count(stuff)