我正在python中编写一个程序来替换数据框的某些值,我的想法是我有一个名为file.txt的文件,如下所示:
A:::s:::te-sd:::0.3
B:::s:::te-sd:::0.2
C:::s:::fd-df:::0.1
W:::h:::fd-df:::0.1
我希望使用':::'作为分隔符,我想按照以下规则替换一些字符串的四列值:
属于range1的所有值将被替换为'N':
range1=[-0.2,-0.1,0,0.1,0.2] -> 'N'
属于range2的所有值将替换为'L':
range2=[-0.5,-0.4,-0.3] -> 'L'
属于range3的所有值将被替换为'H':
range3=[0.3,0.4,0.5]
为了达到这个目的,我尝试了以下方法:
import pandas as pd
df= pd.read_csv('file.txt', sep=':::',header=None)
labels=df[3]
range1=[-0.2,-0.1,0,0.1,0.2]
range2=[-0.5,-0.4,-0.3]
range3=[0.3,0.4,0.5]
for label in labels:
if((label in range1) == True):
label = 'N'
if((label in range2) == True):
label = 'L'
if((label in range2) == True):
label = 'H'
print(labels)
但是当我打印标签与任何更改保持一致时,我想欣赏有关如何修复它的任何建议。
答案 0 :(得分:1)
查找并使用isin
函数替换字典。像这样......
>>> lookup = {'N': range1, 'L': range2, 'H': range3}
>>> lookup
{'H': [0.3, 0.4, 0.5], 'L': [-0.5, -0.4, -0.3], 'N': [-0.2, -0.1, 0, 0.1, 0.2]}
>>> df
0 1 2 3
0 A s te-sd 0.3
1 B s te-sd 0.2
2 C s fd-df 0.1
3 W h fd-df 0.1
>>> for k, v in lookup.items():
... df.loc[df[3].isin(v), 3] = k
>>> df
0 1 2 3
0 A s te-sd H
1 B s te-sd N
2 C s fd-df N
3 W h fd-df N
答案 1 :(得分:0)
for label in labels:
每次都会创建一个新变量label
,并为其指定一个值。无论如何,更改该值不会影响该元素。试试这个:
for i in xrange(len(labels)):
if label[i] in range1:
label[i] = 'N'
elif label[i] in range2:
label = 'L'
elif label[i] in range3: # You had a bug here -> it was range2
label = 'H'
print labels
答案 2 :(得分:0)
返回的值是字符串。您需要将它们转换为浮点数(或者将范围值更改为字符串)。
label_vals = [float(x[-1]) for x in df[3].str.split(':::')]
可替换地:
def mapping(val):
if -.2 <= val <= .2:
return 'N'
if .3 <= val <= .5:
return 'H'
if -.5 <= val <= -.3:
return 'L'
new_labels = [':::'.join(x[:-1] + [mapping(float(x[-1]))])
for x in df[3].str.split(':::')]
>>> new_labels
['A:::s:::te-sd:::H',
'B:::s:::te-sd:::N',
'C:::s:::fd-df:::N',
'W:::h:::fd-df:::N']