Question

首先，这不是重复的！我已经搜索了几个SO问题以及Pandas文档，但没有发现任何结论。要创建一个具有行值的新列，例如this和this！

想象一下，我有下表，打开一个.xls ，然后用它创建一个数据框。因为这是从实际问题中创建的一个小示例，所以我创建了这个简单的Excel表，该表可以轻松复制：

我现在想要的是找到具有"Population Month Year"的行（我将查看不同的.xls，因此结构是相同的：人口，月份和年份。

xls='population_example.xls'
sheet_name='Sheet1'
df = pd.read_excel(xls, sheet_name=sheet_name, header=0, skiprows=2)
df

我的想法是：

使用startswith
创建一列，对该值进行python处理并获取月份和年份的值。

我尝试了一些与此类似的事情：

dff=df[s.str.startswith('Population')]
dff

但是错误不会停止。在上面的代码错误中，具体是：

IndexingError：作为索引器提供的不可对齐的布尔系列（布尔系列和被索引对象的索引不匹配

我有几个猜测：

即使阅读文档，我也无法正确理解Series在熊猫中的工作方式。我什至没有考虑过使用它们，但是startswith看起来就像我想要的东西。
如果我处理正确，可能会有一个NaN error，但是我不能使用df.dropna()，因为我会丢失该行值（Population April 2017）！

编辑：

使用此问题：

df[df['Area'].str.startswith('Population')]是它将检查na values。

这：

df['Area'].str.startswith('Population')

会给我一组正确/错误/不正确的值，我不确定该如何使用。

Answer 1

感谢@Erfan，我找到了解决方法：

正确使用注释中的代码行，而不是像我尝试的那样，我设法做到了：

dff=df[df['Area'].str.startswith('Population', na=False)] dff

哪个会输出：Population and household forecasts, 2016 to 20... NaN NaN NaN NaN NaN NaN

现在我可以像

一样访问此值

value=dff.iloc[0][0] value

要获取我一直在寻找的字符串：'Population and household forecasts, 2016 to 2041, prepared by .id , the population experts, April 2019.' 我可以用python来创建所需的列。谢谢！

Answer 2

您可以尝试：

import pandas as pd
import numpy as np

pd.DataFrame({'Area': [f'Whatever{i+1}' for i in range(3)] + [np.nan, 'Population April 2017.'],
              'Population': [3867, 1675, 1904, np.nan, np.nan]}).to_excel('population_example.xls', index=False)

df = pd.read_excel('population_example.xls').fillna('')

population_date = df[df.Area.str.startswith('Population')].Area.values[0].lstrip('Population ').rstrip('.').split()

结果：

['April', '2017']

或（如果“人口月份”始终在最后一行）：

df.iloc[-1, 0].lstrip('Population ').rstrip('.').split()

使用行值在熊猫中创建新列

2 个答案: