我需要在现有数据的pandas数据框中添加三列。
df
>>
n a b
0 3 1.2 1.4
1 2 2.8 3.8
2 3 2.3 2.0
3 3 1.7 5.7
4 2 6.9 4.9
5 1 3.9 19.0
6 9 2.3 8.3
7 5 8.5 3.1
8 18 6.7 7.0
9 10 5.6 6.4
我做了以下
import pandas
import numpy
def add_tests(add_df):
new_tests = """
(a+b)/n
(a*b)/n
((a+b)/n)**-1
""".split()
rows = add_df.shape[0]
cols = len(new_tests)
U = pandas.DataFrame(numpy.empty([rows, cols]), columns=new_tests)
add_df = pandas.concat([df, U], axis=1)
for i, row in add_df.iterrows():
# 1) good calculation:
add_df['(a+b)/n'].loc[i] = (add_df['a'].loc[i] + add_df['b'].loc[i])/ add_df['n'].loc[i]
# 2) good calculation (Both ways):
add_df['(a*b)/n'].loc[i] = (row['a'] * row['b'])/ row['n']
# 3) bad calculation
add_df['((a+b)/n)**-1'].loc[i] = row['(a+b)/n'] ** -1
pass
return add_df
我收到下一条警告信息:
df = add_tests(df)
df
>>
C:... \ pandas \ core \ indexing.py:141:SettingWithCopyWarning: 尝试在DataFrame
的切片副本上设置值
n a b (a+b)/n (a*b)/n ((a+b)/n)**-1
0 3 1.2 1.4 0.866667 0.560000 0.833333
1 2 2.8 3.8 3.300000 5.320000 0.588235
2 3 2.3 2.0 1.433333 1.533333 0.434783
3 3 1.7 5.7 2.466667 3.230000 0.178571
4 2 6.9 4.9 5.900000 16.905000 0.500000
5 1 3.9 19.0 22.900000 74.100000 0.052632
6 9 2.3 8.3 1.177778 2.121111 0.142857
7 5 8.5 3.1 2.320000 5.270000 0.263158
8 18 6.7 7.0 0.761111 2.605556 0.111111
9 10 5.6 6.4 1.200000 3.584000 0.666667
显然第3步无法正常工作...... 如何正确地做到这一点?
答案 0 :(得分:1)
有趣eval
\n
分隔的公式字符串,以传递给eval
ftups = [('aa', '(a+b)/n'), ('bb', '(a*b)/n'), ('cc', '((a+b)/n)**-1')]
forms = '\n'.join([' = '.join(tup) for tup in ftups])
fdict = dict(ftups)
df.eval(forms, inplace=False).rename(columns=fdict)
n a b (a+b)/n (a*b)/n ((a+b)/n)**-1
0 3 1.2 1.4 0.866667 0.560000 1.153846
1 2 2.8 3.8 3.300000 5.320000 0.303030
2 3 2.3 2.0 1.433333 1.533333 0.697674
3 3 1.7 5.7 2.466667 3.230000 0.405405
4 2 6.9 4.9 5.900000 16.905000 0.169492
5 1 3.9 19.0 22.900000 74.100000 0.043668
6 9 2.3 8.3 1.177778 2.121111 0.849057
7 5 8.5 3.1 2.320000 5.270000 0.431034
8 18 6.7 7.0 0.761111 2.605556 1.313869
9 10 5.6 6.4 1.200000 3.584000 0.833333