Question

我正在尝试使用groupby方法在数据框列中合并单元格值（字符串），并使用逗号分隔分组单元格中的单元格值。我遇到以下错误：

TypeError: sequence item 0: expected str instance, float found

该错误发生在以下代码行中，有关完整代码，请参见代码块：

toronto_df['Neighbourhood'] = toronto_df.groupby(['Postcode','Borough'])['Neighbourhood'].agg(lambda x: ','.join(x))

似乎在groupby函数中，与未分组数据帧中的每一行相对应的索引在连接之前已自动添加到字符串中。这导致TypeError。但是，我不知道如何解决此问题。我浏览了很多主题，但没有找到解决方案。我将不胜感激！

# Import Necessary Libraries

import numpy as np
import pandas as pd
from bs4 import BeautifulSoup
import requests

# Use BeautifulSoup to scrap information in the table from the Wikipedia page, and set up the dataframe containing all the information in the table

wiki_html = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text
soup = BeautifulSoup(wiki_html, 'lxml')
# print(soup.prettify())
table = soup.find('table', class_='wikitable sortable')
table_columns = []
for th_txt in table.tbody.findAll('th'):
    table_columns.append(th_txt.text.rstrip('\n'))

toronto_df = pd.DataFrame(columns=table_columns) 

for row in table.tbody.findAll('tr')[1:]:
    row_data = []
    for td_txt in row.findAll('td'):
        row_data.append(td_txt.text.rstrip('\n'))
    toronto_df = toronto_df.append({table_columns[0]: row_data[0],
                                    table_columns[1]: row_data[1],
                                    table_columns[2]: row_data[2]}, ignore_index=True)
toronto_df.head()

# Remove cells with a borough that is Not assigned
toronto_df.replace('Not assigned',np.nan, inplace=True)
toronto_df = toronto_df[toronto_df['Borough'].notnull()]
toronto_df.reset_index(drop=True, inplace=True)
toronto_df.head()

# If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough
toronto_df['Neighbourhood'] = toronto_df.groupby(['Postcode','Borough'])['Neighbourhood'].agg(lambda x: ','.join(x))
toronto_df.drop_duplicates(inplace=True)
toronto_df.head()

“邻居”（Neighbourhood）列的预期结果应使用逗号分隔分组单元格中的单元格值，并显示类似以下内容（我尚无法发布图像，因此我只提供链接）：

https://d3c33hcgiwev3.cloudfront.net/imageAssetProxy.v1/7JXaz3NNEeiMwApe4i-fLg_40e690ae0e927abda2d4bde7d94ed133_Screen-Shot-2018-06-18-at-7.17.57-PM.png?expiry=1557273600000&hmac=936wN3okNJ1UTDA6rOpQqwELESvqgScu08_Spai0aQQ

Answer 1

如评论中所述，NaN是一个浮点数，因此尝试对其执行字符串操作不起作用（这就是错误消息的原因）

用以下代码替换代码的最后一部分：根据您在注释中指定的逻辑，使用布尔索引完成nan的填充

# If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough
toronto_df.Neighbourhood = np.where(toronto_df.Neighbourhood.isnull(),toronto_df.Borough,toronto_df.Neighbourhood)
toronto_df['Neighbourhood'] = toronto_df.groupby(['Postcode','Borough'])['Neighbourhood'].agg(lambda x: ','.join(x))

如何解决此“ TypeError：序列项0：预期的str实例，找到浮点数”

1 个答案: