很抱歉您提出一个幼稚的问题,但此刻让我发疯。我有一个数据框df1,并使用它来创建新的数据框df2,如下所示:
import pandas as pd
def NewDF(df):
df['sum']=df['a']+df['b']
return df
df1 =pd.DataFrame({'a':[1,2,3],'b':[4,5,6]})
print(df1)
df2 =NewDF(df1)
print(df1)
给出
a b
0 1 4
1 2 5
2 3 6
a b sum
0 1 4 5
1 2 5 7
2 3 6 9
为什么我失去df1形状并获得第三列?如何避免这种情况?
答案 0 :(得分:3)
DataFrames是mutable
,因此您应该将副本明确传递给函数,或者让函数的第一步复制输入。否则,就像列表一样,您对函数所做的任何修改也将应用于原始列表。
您的选择是:
def NewDF(df):
df = df.copy()
df['sum']=df['a']+df['b']
return df
df2 = NewDF(df1)
或
df2 = NewDF(df1.copy())
在这里我们可以看到原始实现中的所有内容都指向同一对象
import pandas as pd
def NewDF(df):
print(id(df))
df['sum']=df['a']+df['b']
print(id(df))
return df
df1 =pd.DataFrame({'a':[1,2,3],'b':[4,5,6]})
print(id(df1))
#2242099787480
df2 = NewDF(df1)
#2242099787480
#2242099787480
print(id(df2))
#2242099787480
答案 1 :(得分:0)
获得的第三列是“索引”列,每个pandas DataFrame都将始终保留一个索引,但是您可以选择是否在输出中不使用它。
C:\Users\esin>pip install beautifulsoup4
Collecting beautifulsoup4
Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1056)'))': /simple/beautifulsoup4/
Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1056)'))': /simple/beautifulsoup4/
Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1056)'))': /simple/beautifulsoup4/
Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1056)'))': /simple/beautifulsoup4/
Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1056)'))': /simple/beautifulsoup4/
Could not fetch URL https://pypi.org/simple/beautifulsoup4/: There was a problem confirming the ssl certificate: HTTPSConnectionPool(host='pypi.org', port=443): Max retries exceeded with url: /simple/beautifulsoup4/ (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1056)'))) - skipping
Could not find a version that satisfies the requirement beautifulsoup4 (from versions: )
No matching distribution found for beautifulsoup4
Could not fetch URL https://pypi.org/simple/pip/: There was a problem confirming the ssl certificate: HTTPSConnectionPool(host='pypi.org', port=443): Max retries exceeded with url: /simple/pip/ (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1056)'))) - skipping
将输出显示为
import pandas as pd
def NewDF(df):
df['sum']=df['a']+df['b']
return df
df1 =pd.DataFrame({'a':[1,2,3],'b':[4,5,6]})
print(df1.to_string(index=False))
df2 =NewDF(df1)
print(df1.to_string(index = False))
现在您可能有一个问题,为什么索引存在,索引实际上是一个支持后备的哈希表,它可以提高速度,并且在多个上下文中是非常可取的功能,如果这只是一个问题,那么就足够了,但是如果您想了解有关熊猫的更多信息,我建议您研究索引,您可以从这里https://stackoverflow.com/a/27238758/10953776
开始