我无法在下面的代码片段上执行“lambda 函数”。
我想要的目标是分别拆分列( btts_x 和 btts_y )以进行进一步的数学计算。
lambda 函数在第一个位置列 btts_x 上成功(参见 btts_x_1 & btts_x_2 );但在 btts_y 列上失败,如回溯 re ValueError 中所示。我想我需要在 lambda 函数中传递一个 re.sub(),但是我坚持使用它并希望得到帮助!
注意:Team_x 中的特殊字符 \n\n & Team_y 中的 \n;因此 re.sub() 问题
def results(frame):
frame[['btts_x_1', 'btts_x_2']] = frame['btts_x'].apply(lambda x: x.split('\n\n')).apply(pd.Series).astype(float)
frame[['btts_y_1', 'btts_y_2']] = frame['btts_y'].apply(lambda x: x.split('\n\n')).apply(pd.Series).astype(float)
Teams_x btts_x Teams_y btts_y btts_x_1 btts_x_2
0 Leicester City vs Manchester United 1.55\n\n2.40 Leicester City vs Manchester United 1.50\n2.40 1.55 2.40
1 Aston Villa vs Crystal Palace 1.68\n\n2.14 Aston Villa vs Crystal Palace 1.60\n2.20 1.68 2.14
2 Fulham vs Southampton 1.72\n\n2.08 Fulham FC vs Southampton FC 1.70\n2.00 1.72 2.08
3 Arsenal vs Chelsea 1.79\n\n1.98 Arsenal FC vs Chelsea FC 1.70\n2.00 1.79 1.98
...
TraceBack....
4 frame[['btts_x_1', 'btts_x_2']] = frame['btts_x'].apply(lambda x: x.split('\n\n')).apply(pd.Series).astype(float)
----> 5 frame[['btts_y_1', 'btts_y_2']] = frame['btts_y'].apply(lambda x: x.split('\n\n')).apply(pd.Series).astype(float)
6
7 # frame[['btts_x_1', 'btts_x_2']] = frame['btts_x'].apply(lambda x: x.split('\n')).apply(pd.Series).astype(float)
D:\Anaconda\envs\web_scraping\lib\site-packages\pandas\core\generic.py in astype(self, dtype, copy, errors)
5546 else:
5547 # else, only a single dtype is given
-> 5548 new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors,)
5549 return self._constructor(new_data).__finalize__(self, method="astype")
5550
D:\Anaconda\envs\web_scraping\lib\site-packages\pandas\core\internals\managers.py in astype(self, dtype, copy, errors)
602 self, dtype, copy: bool = False, errors: str = "raise"
603 ) -> "BlockManager":
--> 604 return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
605
606 def convert(
D:\Anaconda\envs\web_scraping\lib\site-packages\pandas\core\internals\managers.py in apply(self, f, align_keys, **kwargs)
407 applied = b.apply(f, **kwargs)
408 else:
--> 409 applied = getattr(b, f)(**kwargs)
410 result_blocks = _extend_blocks(applied, result_blocks)
411
D:\Anaconda\envs\web_scraping\lib\site-packages\pandas\core\internals\blocks.py in astype(self, dtype, copy, errors)
593 vals1d = values.ravel()
594 try:
--> 595 values = astype_nansafe(vals1d, dtype, copy=True)
596 except (ValueError, TypeError):
597 # e.g. astype_nansafe can fail on object-dtype of strings
D:\Anaconda\envs\web_scraping\lib\site-packages\pandas\core\dtypes\cast.py in astype_nansafe(arr, dtype, copy, skipna)
995 if copy or is_object_dtype(arr) or is_object_dtype(dtype):
996 # Explicit copy, or required since NumPy can't view from / to object.
--> 997 return arr.astype(dtype, copy=True)
998
999 return arr.view(dtype)
ValueError: could not convert string to float: '1.50\n2.40'
抱歉让我拖这么久。
答案 0 :(得分:1)
让我们尝试按原始字符串 '\n\n' 拆分,展开并重命名列
df['x']=df.btts_x+ " "+df.btts_y
df1=df.join(df['x'].str.split(r'\n\n|\n|\s|\n', expand=True)).rename(columns={0:'btts_x_1',1:'btts_x_2',2:'btts_y_1',3:'btts_y_2'}).drop(columns=['x'])
print(df1)
Teams_x btts_x \
0 Leicester City vs Manchester United 1.55\n\n2.40
1 Aston Villa vs Crystal Palace 1.68\n\n2.14
2 Fulham vs Southampton 1.72\n\n2.08
3 Arsenal vs Chelsea 1.79\n\n1.98
Teams_y btts_y btts_x_1 btts_x_2 btts_y_1 \
0 Leicester City vs Manchester United 1.50\n2.40 1.55 2.40 1.50
1 Aston Villa vs Crystal Palace 1.60\n2.20 1.68 2.14 1.60
2 Fulham FC vs Southampton FC 1.70\n2.00 1.72 2.08 1.70
3 Arsenal FC vs Chelsea FC 1.70\n2.00 1.79 1.98 1.70
btts_y_2
0 2.40
1 2.20
2 2.00
3 2.00