我有像这样的pandas数据帧(df)
Close Close Close Close Close
Date
2000-01-03 00:00:00 NaN NaN NaN NaN -0.033944
2000-01-04 00:00:00 NaN NaN NaN NaN 0.0351366
2000-01-05 00:00:00 -0.033944 NaN NaN NaN -0.0172414
2000-01-06 00:00:00 0.0351366 -0.033944 NaN NaN -0.00438596
2000-01-07 00:00:00 -0.0172414 0.0351366 -0.033944 NaN 0.0396476
在R
如果我想选择第五列
five=df[,5]
且没有第5列
rest=df[,-5]
如何使用pandas dataframe进行类似的操作
我在pandas中试过这个
five=df.ix[,5]
但它给出了这个错误
File "", line 1
df.ix[,5]
^
SyntaxError: invalid syntax
答案 0 :(得分:6)
使用iloc
。它明确地是基于位置的索引器。 ix
可以是两者,如果索引是基于整数的话会混淆。
df.iloc[:, [4]]
除第五栏外的其他所有
slc = list(range(df.shape[1]))
slc.remove(4)
df.iloc[:, slc]
或等效
df.iloc[:, [i for i in range(df.shape[1]) if i != 4]]
答案 1 :(得分:2)
如果你想要第五栏:
df.ix[:,4]
将冒号粘在那里以取出该列的所有行。
要排除第五列,您可以尝试:
df.ix[:, (x for x in range(0, len(df.columns)) if x != 4)]
答案 2 :(得分:1)
按索引选择过滤器列:
In [19]: df
Out[19]:
Date Close Close.1 Close.2 Close.3 Close.4
0 2000-01-0300:00:00 NaN NaN NaN NaN -0.033944
1 2000-01-0400:00:00 NaN NaN NaN NaN 0.035137
2 2000-01-0500:00:00 -0.033944 NaN NaN NaN -0.017241
3 2000-01-0600:00:00 0.035137 -0.033944 NaN NaN -0.004386
4 2000-01-0700:00:00 -0.017241 0.035137 -0.033944 NaN 0.039648
In [20]: df.ix[:, 5]
Out[20]:
0 -0.033944
1 0.035137
2 -0.017241
3 -0.004386
4 0.039648
Name: Close.4, dtype: float64
In [21]: df.icol(5)
/usr/bin/ipython:1: FutureWarning: icol(i) is deprecated. Please use .iloc[:,i]
#!/usr/bin/python2
Out[21]:
0 -0.033944
1 0.035137
2 -0.017241
3 -0.004386
4 0.039648
Name: Close.4, dtype: float64
In [22]: df.iloc[:, 5]
Out[22]:
0 -0.033944
1 0.035137
2 -0.017241
3 -0.004386
4 0.039648
Name: Close.4, dtype: float64
选择除索引以外的所有列:
In [29]: df[[df.columns[i] for i in range(len(df.columns)) if i != 5]]
Out[29]:
Date Close Close.1 Close.2 Close.3
0 2000-01-0300:00:00 NaN NaN NaN NaN
1 2000-01-0400:00:00 NaN NaN NaN NaN
2 2000-01-0500:00:00 -0.033944 NaN NaN NaN
3 2000-01-0600:00:00 0.035137 -0.033944 NaN NaN
4 2000-01-0700:00:00 -0.017241 0.035137 -0.033944 NaN
答案 3 :(得分:0)
如果您的DataFrame没有列/行标签,而您想选择一些特定的列,则应使用iloc方法。
如果要选择第一列和所有行的示例:
df = dataset.iloc[:,0]
此处df变量将包含存储在数据框第一列中的值。
请记住
type(df) -> pandas.core.series.Series
希望有帮助