Question

我正在通过在现有列中拆分变量来创建新列

Category_list是一个现有的列，具有公司所属的一个或多个类别，其中多个类别之间用竖线（|）分隔。

我试图通过假设第一个（如果有多个类别）应该是主要类别来确定公司所属的主要类别。

 master_frame_ven['primary_sector'] = master_frame_ven['category_list'].str.split(pat = '|')

执行此操作并执行上面的代码后，拆分的对象存储在primary_category列的列表中。

要从列表中删除它们，将它们存储为单独的实体并仅提取第一个值，我试图这样做，

master_frame_ven['primary_sector'] = master_frame_ven['primary_sector'].apply(lambda x: x[0])

执行此操作时，它会引发错误：

> TypeError                                 Traceback (most recent call
> last) <ipython-input-66-5dffe0256421> in <module>
>       1 master_frame_ven['primary_sector'] = master_frame_ven['category_list'].str.split(pat = '|')
>       2 
> ----> 3 master_frame_ven['primary_sector'] = master_frame_ven['primary_sector'].apply(lambda x: x[0])
> 
> TypeError: 'float' object is not subscriptable

这是我当前正在使用的数据框。

> {'company_permalink': {0: '/organization/-fame',   1:
> '/organization/-qounter',   3: '/organization/-the-one-of-them-inc-', 
> 4: '/organization/0-6-com',   5: '/organization/004-technologies'}, 
> 'funding_round_permalink': {0:
> '/funding-round/9a01d05418af9f794eebff7ace91f638',   1:
> '/funding-round/22dacff496eb7acb2b901dec1dfe5633',   3:
> '/funding-round/650b8f704416801069bb178a1418776b',   4:
> '/funding-round/5727accaeaa57461bd22a9bdd945382d',   5:
> '/funding-round/1278dd4e6a37fa4b7d7e06c21b3c1830'}, 
> 'funding_round_type': {0: 'venture',   1: 'venture',   3: 'venture',  
> 4: 'venture',   5: 'venture'},  'funding_round_code': {0: 'B', 1: 'A',
> 3: 'B', 4: 'A', 5: nan},  'funded_at': {0: '05-01-2015',   1:
> '14-10-2014',   3: '30-01-2014',   4: '19-03-2008',   5:
> '24-07-2014'},  'raised_amount_usd': {0: 10000000.0,   1: nan,   3:
> 3406878.0,   4: 2000000.0,   5: nan},  'permalink': {0: '/organization/-fame',   1: '/organization/-qounter',   3:
> '/organization/-the-one-of-them-inc-',   4: '/organization/0-6-com',  
> 5: '/organization/004-technologies'},  'name': {0: '#fame',   1:
> ':Qounter',   3: '(THE) ONE of THEM,Inc.',   4: '0-6.com',   5: '004
> Technologies'},  'homepage_url': {0: 'http://livfame.com',   1:
> 'http://www.qounter.com',   3: 'http://oneofthem.jp',   4:
> 'http://www.0-6.com',   5: 'http://004gmbh.de/en/004-interact'}, 
> 'category_list': {0: 'Media',   1: 'Application Platforms|Real
> Time|Social Network Media',   3: 'Apps|Games|Mobile',   4: 'Curated
> Web',   5: 'Software'},  'status': {0: 'operating',   1: 'operating', 
> 3: 'operating',   4: 'operating',   5: 'operating'},  'country_code':
> {0: 'IND', 1: 'USA', 3: nan, 4: 'CHN', 5: 'USA'},  'state_code': {0:
> '16', 1: 'DE', 3: nan, 4: '22', 5: 'IL'},  'region': {0: 'Mumbai',  
> 1: 'DE - Other',   3: nan,   4: 'Beijing',   5: 'Springfield,
> Illinois'},  'city': {0: 'Mumbai',   1: 'Delaware City',   3: nan,  
> 4: 'Beijing',   5: 'Champaign'},  'founded_at': {0: nan,   1:
> '04-09-2014',   3: nan,   4: '01-01-2007',   5: '01-01-2010'}, 
> 'primary_sector': {0: 'Media',   1: 'Application Platforms',   3:
> 'Apps',   4: 'Curated Web',   5: 'Software'}}

我不知道它的哪一部分是浮点数，以及为什么我精确地得到了这个错误。我该怎么做才能从列表中仅提取主要类别并将其存储为字符串，而不是将其保留为列表？

Answer 1

您选择不提供reproducible example，这将有助于人们检查提议的解决方案。

您使用过：

lambda x: x[0]

您会更满意：

lambda x: str(x[0])

如何仅从列表中提取第一个元素？

1 个答案: