Question

样本数据框：

data = [["2011-01-01",23],["2011-01-02",33],["2011-01-03",43],["2011-01-04",53]]
df= pd.DataFrame(data,columns = ["A","B"])
df["A"] = pd.to_datetime(df["A"])
df.index = df["A"]
del df["A"]

OP：

            B
A   
2011-01-01  23
2011-01-02  33
2011-01-03  43
2011-01-04  53

我正在尝试使用以下代码将此数据帧分为两部分：

part1 = df.loc[:"2011-01-02"]

op：

            B
A   
2011-01-01  23
2011-01-02  33

第2部分：

part2 = df.loc["2011-01-02":]

op：

            B
A   
2011-01-02  33
2011-01-03  43
2011-01-04  53

但是两个部分（第1部分和第2部分）中都有索引为“ 2011-01-02”的行。任何建议让熊猫1班轮只分成1部分而不是同时获得这一行。

Answer 1

这是预期的行为（直到今天我也不知道）

这种切片将在DateTimeIndex为好。由于部分字符串选择是标签切片的一种形式，端点将包括在内。这将包括匹配时间包含的日期：来自http://pandas-docs.github.io/pandas-docs-travis/user_guide/timeseries.html#indexing。

关于标签切片的行为

请注意，与通常的python切片相反，开始和停止都包括在内 https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.loc.html#pandas.DataFrame.loc

In [16]: df[df.index < '2011-01-02']
Out[16]:
             B
A
2011-01-01  23

In [17]: df[df.index >= '2011-01-02']
Out[17]:
             B
A
2011-01-02  33
2011-01-03  43
2011-01-04  53

In [18]: df[df.index > '2011-01-02']
Out[18]:
             B
A
2011-01-03  43
2011-01-04  53

Answer 2

slice = df.index > "2011-01-02"
df[slice]
df[~slice]

Answer 3

代替part2 = df.loc["2011-01-02":]使用

part2 = df.loc["2011-01-02":].iloc[1:]

             B
A             
2011-01-03  43
2011-01-04  53

Answer 4

将get_loc与iloc一起使用

df.iloc[:df.index.get_loc('2011-01-02')]
                    A   B
A                        
2011-01-01 2011-01-01  23

df.iloc[df.index.get_loc('2011-01-02'):]
                    A   B
A                        
2011-01-02 2011-01-02  33
2011-01-03 2011-01-03  43
2011-01-04 2011-01-04  53

索引大熊猫数据逐帧

4 个答案: