Question

我有一个大型DataFrame对象，我希望通过引用访问它的一部分，也就是说，无论何时更新原始的大型DataFrame，都会更新较小的DataFrame。

由于显而易见的原因，创建较小部件的副本不起作用。

import pandas as pd

# Create a DataFrame
large_df= pd.DataFrame(dict(a=range(3)))
large_df

0:
   a
0  0
1  1
2  2

# Sample some of the DataFrame indices.
# In this example I keep accessing the even rows of a DataFrame
# while updating it, but `sample` is, in general,
# a random list of rows.
sample=[0,2]

# Create a copy of the sampled part of the DataFrame
sub_df = large_df.loc[sample]
sub_df

1:
   a
0  0
2  2

# Modify the original DataFrame
large_df.loc[:,'b'] = range(3,6)
large_df

2:
   a  b
0  0  3
1  1  4
2  2  5

# The copy of the sampled part is kept unchanged 
sub_df

3:
   a
0  0
2  2

我找到的唯一解决方案是回到loc声明。

# Reusing loc, the sampled part includes the modification
large_df.loc[sample]
4:
   a  b
0  0  3
2  2  5

有更简单的方法吗？

在python中通过引用访问pandas DataFrame的一部分

0 个答案: