如何将Pandas数据框中的列分隔为唯一的bin /列?

时间:2019-10-24 20:46:09

标签: python pandas dataframe

我有一个具有以下结构的当前数据框:

Could not load file or assembly 'System.Web.Http, Version=5.2.3.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35' or one of its dependencies. The located assembly's manifest definition does not match the assembly reference. (Exception from HRESULT: 0x80131040)
Description: An unhandled exception occurred during the execution of the current web request. Please review the stack trace for more information about the error and where it originated in the code.

Exception Details: System.IO.FileLoadException: Could not load file or assembly 'System.Web.Http, Version=5.2.3.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35' or one of its dependencies. The located assembly's manifest definition does not match the assembly reference. (Exception from HRESULT: 0x80131040)

Source Error:

An unhandled exception was generated during the execution of the current web request. Information regarding the origin and location of the exception can be identified using the exception stack trace below.

Assembly Load Trace: The following information can be helpful to determine why the assembly 'System.Web.Http, Version=5.2.3.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35' could not be loaded.


WRN: Assembly binding logging is turned OFF.
To enable assembly bind failure logging, set the registry value [HKLM\Software\Microsoft\Fusion!EnableLog] (DWORD) to 1.
Note: There is some performance penalty associated with assembly bind failure logging.
To turn this feature off, remove the registry value [HKLM\Software\Microsoft\Fusion!EnableLog].

Stack Trace:


[FileLoadException: Could not load file or assembly 'System.Web.Http, Version=5.2.3.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35' or one of its dependencies. The located assembly's manifest definition does not match the assembly reference. (Exception from HRESULT: 0x80131040)]
   Prasadseed.WebApiApplication.Application_Start() +0

[HttpException (0x80004005): Could not load file or assembly 'System.Web.Http, Version=5.2.3.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35' or one of its dependencies. The located assembly's manifest definition does not match the assembly reference. (Exception from HRESULT: 0x80131040)]
   System.Web.HttpApplicationFactory.EnsureAppStartCalledForIntegratedMode(HttpContext context, HttpApplication app) +10103347
   System.Web.HttpApplication.RegisterEventSubscriptionsWithIIS(IntPtr appContext, HttpContext context, MethodInfo[] handlers) +123
   System.Web.HttpApplication.InitSpecial(HttpApplicationState state, MethodInfo[] handlers, IntPtr appContext, HttpContext context) +181
   System.Web.HttpApplicationFactory.GetSpecialApplicationInstance(IntPtr appContext, HttpContext context) +228
   System.Web.Hosting.PipelineRuntime.InitializeApplication(IntPtr appContext) +314

[HttpException (0x80004005): Could not load file or assembly 'System.Web.Http, Version=5.2.3.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35' or one of its dependencies. The located assembly's manifest definition does not match the assembly reference. (Exception from HRESULT: 0x80131040)]
   System.Web.HttpRuntime.FirstRequestInit(HttpContext context) +10083568
   System.Web.HttpRuntime.EnsureFirstRequestInit(HttpContext context) +99
   System.Web.HttpRuntime.ProcessRequestNotificationPrivate(IIS7WorkerRequest wr, HttpContext context) +263

Version Information: Microsoft .NET Framework Version:4.0.30319; ASP.NET Version:4.8.3928.0

我有兴趣解析每列并将其放入与相同名称关联的唯一列中;即:

customer    item 1  item 2  item 3
John        Apples  Oranges Bananas
Blake       Bananas
Steph       Oranges Bananas

在Pandas / Numpy中执行此操作的最佳方法是什么?

2 个答案:

答案 0 :(得分:1)

这是一个可行的解决方案,可以为您提供所需的结果。

df=pd.DataFrame({'customer':['John','Blake','Steph'],'item1':['Apples','Bananas','Oranges'],'item2':['Oranges',np.nan,'Bananas'],'item3':['Bananas',np.nan,np.nan]})
#Get unique items
df=pd.melt(df,id_vars=['customer'])
df2=pd.DataFrame(pd.pivot_table(df,columns='value',index='customer',aggfunc='count').to_records())
df2.columns=['customer','item1','item2','item3']
df2['item1'][df2['item1']==1]='Apples'
df2['item2'][df2['item2']==1]='Bananas'
df2['item3'][df2['item3']==1]='Oranges'
df2

答案 1 :(得分:1)

我认为最好不要将数据重塑为所谓的tidy form,其中每一行等于一个观察值,然后应用分组依据,而不是尝试将数据重塑为原始文章中的列。特别是如果最终结果是所讨论的项目/客户的数量或总和。

import pandas as pd
import numpy as np
data = pd.DataFrame(np.array([['john', 'apples', 'oranges', 'bananas'], ['blake', 'bananas', '', ''],
                              ['steph', '', 'bananas', 'bananas']]),
                    columns=['customer', 'item_1', 'item_2', 'item_3'])

# make tidy
tidy_data = pd.melt(data, ['customer'], var_name=['cols'], value_name='item')
tidy_data = tidy_data[['customer', 'item']]
#count each type of item the customer has 
grouped_data = tidy_data.groupby(['customer', 'item'])['item'].count().rename(columns={'item': 'counts'})
grouped_data = grouped_data.reset_index(name='counts')
grouped_data = grouped_data[grouped_data.item != '']
grouped_data

给出以下输出:

  customer     item  counts
1    blake  bananas       1
2     john   apples       1
3     john  bananas       1
4     john  oranges       1
6    steph  bananas       2

如果您只需要每个商品的数量而不是客户的数量,则只需为分组依据

grouped_data = tidy_data.groupby(['item'])['item'].count().rename(columns={'item': 'counts'})
grouped_data = grouped_data.reset_index(name='counts')

给出以下输出:

      item  counts
1   apples       1
2  bananas       4
3  oranges       1