我正在尝试优化代码并节省时间。我当前的解决方案有效,但当我将类似的函数应用于多个数据帧时,它是冗余的,无法维护。
如何根据其他列的条件自动创建新列?
部分数据:
import pandas as pd
df = {'Column1': [1,2,3,4,5],
'Column2': ["A","B","C","D","E"]}
df = pd.DataFrame(df, columns=['Column1','Column2'])
df
Column1 Column2
0 1 A
1 2 B
2 3 C
3 4 D
4 5 E
# create band if column 2 contains A-C
df['Col_2_Band V1'] = "D-E"
df['Col_2_Band V1'][df['Column2'].isin(['A','B','C'])] = "A-C"
df
Column1 Column2 Col_2_Band V1
0 1 A A-C
1 2 B A-C
2 3 C A-C
3 4 D D-E
4 5 E D-E
def applyV2(row):
row['Col_2_Band V2'] = "D-E"
row['Col_2_Band V2'][df['Column2'].isin(['A','B','C'])] = "A-C"
return row
df = df.apply(applyV2, axis=1)
**Error:**
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-8-cf5d31427d02> in <module>()
4 return row
5
----> 6 df = df.apply(applyV2, axis=1)
C:\Users\cfeld\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\frame.py in apply(self, func, axis, broadcast, raw, reduce, args, **kwds)
4852 f, axis,
4853 reduce=reduce,
-> 4854 ignore_failures=ignore_failures)
4855 else:
4856 return self._apply_broadcast(f, axis)
C:\Users\cfeld\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\frame.py in _apply_standard(self, func, axis, ignore_failures, reduce)
4948 try:
4949 for i, v in enumerate(series_gen):
-> 4950 results[i] = func(v)
4951 keys.append(v.name)
4952 except Exception as e:
<ipython-input-8-cf5d31427d02> in applyV2(row)
1 def applyV2(row):
2 row['Col_2_Band V2'] = "D-E"
----> 3 row['Col_2_Band V2'][df['Column2'].isin(['A','B','C'])] = "A-C"
4 return row
5
TypeError: ("'str' object does not support item assignment", 'occurred at index 0')
# for example
df_10 = df10.apply(applyV2, axis=1)
df_20 = df20.apply(applyV2, axis=1)
df_30 = df30.apply(applyV2, axis=1)
答案 0 :(得分:1)
如果可能,请不使用pd.DataFrame.apply
来处理易于执行的功能。 df.apply
只是一个薄薄的循环。
在这种情况下,以下更有效,同样可维护。 pd.DataFrame.pipe
只是将数据帧放入函数中。我们使用.loc
访问器来分配取决于给定条件的值。
def add_row(df):
df['Col_2_Band V2'] = 'D-E'
df.loc[df['Column2'].isin({'A','B','C'}), 'Col_2_Band V2'] = 'A-C'
return df
df = df.pipe(add_row)
答案 1 :(得分:0)
它不是最干净的,但你可以这样做:
import pandas as pd
df = {'Column1': [1,2,3,4,5],
'Column2': ["A","B","C","D","E"]}
df = pd.DataFrame(df, columns=['Column1','Column2'])
def applyV2(x):
df['Col_2_Band v2'] = df['Column2'].map(lambda x: "A-C" if "A" in x
else 'A-C' if 'B' in x
else 'A-C' if 'C' in x
else 'D-E')
return x
df.apply(applyV2)
输出:
Column1 Column2 Col_2_Band v2
0 1 A A-C
1 2 B A-C
2 3 C A-C
3 4 D D-E
4 5 E D-E