我有两个输入电子表格。
第1页有7列3行
/ FID / Total / A1 / B1 / A2 / B2
1 / 1 / 0.720168405 / 0.635589112 / XXX / 0.031112358 / YYY
1 / 2 / 0.760438562 / 0.328168557 / YYY / 0.311172576 / ZZZ
第2页有2列4行
/ 0
XXX / 0.55
YYY / 0.52
ZZZ / 0.35
这是代码:
import pandas as pd
df = pd.read_excel("C:/Users/Sheet1.xls")
df2 = pd.read_excel("C:/Users/Sheet2.xlsx")
dictionary = df2.to_dict(orient='dict')
b = df.filter(like ='A').values
c = df.filter(like ='B').replace(dictionary[0]).astype(float).values
df['AA'] = ((c * b).sum(axis =1))
df['BB'] = df.AA / df.Total
def custom_round(x, base=5):
return base * round(float(x)/base)
df['C'] = df['BB'].apply(lambda x: custom_round(x, base=.05))
df['C'] = "X = " + df['C'].apply(lambda s: '{:,.2f}'.format(s))
df.to_excel("C:/Users/Results.xlsx")
print(df)
我收到错误消息:值错误无法将字符串转换为浮点数:XXX
ValueError Traceback (most recent call last)
<ipython-input-1-f42c7cb99da5> in <module>()
8
9 b = df.filter(like ='A').values
---> 10 c = df.filter(like ='B').replace(dictionary[0]).astype(float).values
11
12 df['AA'] = ((c * b).sum(axis =1))
C:\ProgramData\Anaconda2\lib\site-packages\pandas\core\generic.pyc in astype(self, dtype, copy, errors, **kwargs)
5689 # else, only a single dtype is given
5690 new_data = self._data.astype(dtype=dtype, copy=copy, errors=errors,
-> 5691 **kwargs)
5692 return self._constructor(new_data).__finalize__(self)
5693
C:\ProgramData\Anaconda2\lib\site-packages\pandas\core\internals\managers.pyc in astype(self, dtype, **kwargs)
529
530 def astype(self, dtype, **kwargs):
--> 531 return self.apply('astype', dtype=dtype, **kwargs)
532
533 def convert(self, **kwargs):
C:\ProgramData\Anaconda2\lib\site-packages\pandas\core\internals\managers.pyc in apply(self, f, axes, filter, do_integrity_check, consolidate, **kwargs)
393 copy=align_copy)
394
--> 395 applied = getattr(b, f)(**kwargs)
396 result_blocks = _extend_blocks(applied, result_blocks)
397
C:\ProgramData\Anaconda2\lib\site-packages\pandas\core\internals\blocks.pyc in astype(self, dtype, copy, errors, values, **kwargs)
532 def astype(self, dtype, copy=False, errors='raise', values=None, **kwargs):
533 return self._astype(dtype, copy=copy, errors=errors, values=values,
--> 534 **kwargs)
535
536 def _astype(self, dtype, copy=False, errors='raise', values=None,
C:\ProgramData\Anaconda2\lib\site-packages\pandas\core\internals\blocks.pyc in _astype(self, dtype, copy, errors, values, **kwargs)
631
632 # _astype_nansafe works fine with 1-d only
--> 633 values = astype_nansafe(values.ravel(), dtype, copy=True)
634
635 # TODO(extension)
C:\ProgramData\Anaconda2\lib\site-packages\pandas\core\dtypes\cast.pyc in astype_nansafe(arr, dtype, copy, skipna)
700 if copy or is_object_dtype(arr) or is_object_dtype(dtype):
701 # Explicit copy, or required since NumPy can't view from / to object.
--> 702 return arr.astype(dtype, copy=True)
703
704 return arr.view(dtype)
ValueError: could not convert string to float: XXX
答案 0 :(得分:0)
我在代码的第六行中看到您正在尝试替换数据帧中的某些集合(XXX,YYY,..至0.55、0.52,..)。但最终您会提供字典 {0:55,1:52,..}其中键实际上是数组索引。
我更改了工作表2标头,以便更轻松地建立索引,例如
0 / 1
XXX / 0.55
YYY / 0.52
ZZZ / 0.35
并通过将第4行替换为
,使用现有的第0列设置索引 dictionary = df2.set_index(0)[1].to_dict()
和您的第6行,
c = df.filter(like ='B').replace(dictionary).astype(float).values
这提供了适当的字典来替换数据框。