我有一个大型DataFrame对象,我希望通过引用访问它的一部分,也就是说,无论何时更新原始的大型DataFrame,都会更新较小的DataFrame。
由于显而易见的原因,创建较小部件的副本不起作用。
import pandas as pd
# Create a DataFrame
large_df= pd.DataFrame(dict(a=range(3)))
large_df
0:
a
0 0
1 1
2 2
# Sample some of the DataFrame indices.
# In this example I keep accessing the even rows of a DataFrame
# while updating it, but `sample` is, in general,
# a random list of rows.
sample=[0,2]
# Create a copy of the sampled part of the DataFrame
sub_df = large_df.loc[sample]
sub_df
1:
a
0 0
2 2
# Modify the original DataFrame
large_df.loc[:,'b'] = range(3,6)
large_df
2:
a b
0 0 3
1 1 4
2 2 5
# The copy of the sampled part is kept unchanged
sub_df
3:
a
0 0
2 2
我找到的唯一解决方案是回到loc
声明。
# Reusing loc, the sampled part includes the modification
large_df.loc[sample]
4:
a b
0 0 3
2 2 5
有更简单的方法吗?