具有特定域的橙色数据表

时间:2017-05-16 04:48:44

标签: python-3.x csv orange

我正在尝试从csv文件创建一个橙色数据表。为实现这一目标,我目前正在尝试使用以下步骤执行此操作:

  1. 创建目标域
  2. 将文件读取到临时数据表
  3. 使用临时表和数据中的数据创建新数据表 目标域
  4. 将csv更改为带有三行标题(https://docs.orange.biolab.si/3/data-mining-library/reference/data.io.html)的标签文件不是一种选择。

    将此过程转换为代码时,我会得到以下内容:

      from Orange.data import Domain, DiscreteVariable, ContinuousVariable, Table
    
        # Creating specific domain. Two attributes and a Class variable used as target
        target_domain = Domain([ContinuousVariable.make("Attribute 1"),ContinuousVariable.make("Attribute 2")],DiscreteVariable.make("Class"))
        print('Target domain:',target_domain) 
        # Target domain: [Attribute 1, Attribute 2 | Class]
    
        # Reading in the file
        test_data = Table.from_file('../data/knn_trainingset_example.csv')
        print('Domain from file:',test_data.domain)
        # Domain from file: [Attribute 1, Attribute 2, Class]
    
        # Using specific domain with test_data
        final_data = Table.from_table(target_domain,test_data)
    
        print('Domain:',final_data.domain)
        print('Data:')
        print(final_data)
        # Domain: [Attribute 1, Attribute 2 | Class]
        # Data:
        # [[0.800, 6.300 | ?],
        #  [1.400, 8.100 | ?],
        #  [2.100, 7.400 | ?],
        #  [2.600, 14.300 | ?],
        #  [6.800, 12.600 | ?],
        #  [8.800, 9.800 | ?],
        # ...
    

    从最终的print语句中可以看出,类变量是未知的(?)而不是预期的类(+或 - )。

    有人可以解释/解决这种行为吗?提供更好/不同的方法来创建具有特定域的数据表?

1 个答案:

答案 0 :(得分:0)

是的,谢谢!如参考文献(https://docs.orange.biolab.si/3/data-mining-library/reference/data.variable.html#discrete-variables)中所述,您必须提供可能的valeus。所以提供那些作为元组就可以了。为了将来参考,我将调整后的代码放在下面。

from Orange.data import Domain, DiscreteVariable, ContinuousVariable, Table

# Creating specific domain. Two attributes and a Class variable used as target
target_domain = Domain([ContinuousVariable.make("Attribute 1"),ContinuousVariable.make("Attribute 2")],DiscreteVariable.make("Class",values=('+','-')))

print('Target domain:',target_domain)
# Target domain: [Attribute 1, Attribute 2 | Class]

# Reading in the file
test_data = Table.from_file('../data/knn_trainingset_example.csv')

print('Domain from file:',test_data.domain)
# Domain from file: [Attribute 1, Attribute 2, Class]

print('Data:')
print(test_data)
# [[0.800, 6.300 | −],
#  [1.400, 8.100 | −],
#  [2.100, 7.400 | −],
#  [2.600, 14.300 | +],
#  [6.800, 12.600 | −],
#  [8.800, 9.800 | +],
# ...

# Using specific domain with test_data
final_data = Table.from_table(target_domain,test_data)

print('Domain:',final_data.domain)
# Domain: [Attribute 1, Attribute 2 | Class]

print('Data:')    
# Data:
# [[0.800, 6.300 | −],
#  [1.400, 8.100 | −],
#  [2.100, 7.400 | −],
#  [2.600, 14.300 | +],
#  [6.800, 12.600 | −],
#  [8.800, 9.800 | +],
# ...