Python:如何使用Rpy2从R导入arules包

时间:2018-04-15 16:05:09

标签: python-3.x rpy2

我正在尝试在R中使用带有一些不错函数的python。特别是我想使用在R(arules)中的一个包中找到的read.transactions函数

我做了以下步骤

1-打开Anaconda和午餐R工作室

在R studio

2- install.packages('arules',dep = TRUE) 3- loadNamespace('arules')

4- .libPaths()

得到了

[1] "D:/Anaconda3/Lib/site-packages/rpy2/R/win-library/3.4"
[2] "C:/Program Files/R/R-3.4.4/library" 

现在我去jupyter笔记本

在Jupyter笔记本中

import rpy2
import rpy2.robjects as RObjects
from rpy2.robjects.packages import importr
utils = importr("utils")


d = {'print.me': 'print_dot_me', 'print_me': 'print_uscore_me'}
try:
    arules = importr('arules', robject_translations = d, lib_loc = "D:/Anaconda3/Lib/site-packages/rpy2/R/win-library/3.4")
except:
    arules = importr('arules', robject_translations = d, lib_loc = "C:/Program Files/R/R-3.4.4/library")

结果是

---------------------------------------------------------------------------
RRuntimeError                             Traceback (most recent call last)
<ipython-input-3-5df30d28440c> in <module>()
      3 try:
----> 4     arules = importr('arules', robject_translations = d, lib_loc = "D:/Anaconda3/Lib/site-packages/rpy2/R/win-library/3.4")
      5 except:

~\Anaconda3\lib\site-packages\rpy2\robjects\packages.py in importr(name, lib_loc, robject_translations, signature_translation, suppress_messages, on_conflict, symbol_r2python, symbol_check_after, data)
    452                               _system_file(package = rname)):
--> 453         env = _get_namespace(rname)
    454         version = _get_namespace_version(rname)[0]

RRuntimeError: Error in loadNamespace(name) : there is no package called 'arules'


During handling of the above exception, another exception occurred:

RRuntimeError                             Traceback (most recent call last)
<ipython-input-3-5df30d28440c> in <module>()
      4     arules = importr('arules', robject_translations = d, lib_loc = "D:/Anaconda3/Lib/site-packages/rpy2/R/win-library/3.4")
      5 except:
----> 6     arules = importr('arules', robject_translations = d, lib_loc = "C:/Program Files/R/R-3.4.4/library")
      7 

~\Anaconda3\lib\site-packages\rpy2\robjects\packages.py in importr(name, lib_loc, robject_translations, signature_translation, suppress_messages, on_conflict, symbol_r2python, symbol_check_after, data)
    451     if _package_has_namespace(rname, 
    452                               _system_file(package = rname)):
--> 453         env = _get_namespace(rname)
    454         version = _get_namespace_version(rname)[0]
    455         exported_names = set(_get_namespace_exports(rname))

RRuntimeError: Error in loadNamespace(name) : there is no package called 'arules'

无法将R包导入Python

我对DirichletReg做了同样的事情并且成功了。我不知道为什么。

任何人都可以帮我吗?

2 个答案:

答案 0 :(得分:1)

importr在R_HOME目录中查找已安装的R软件包。我认为,rules包未添加到R_HOME的库文件夹中,而是被添加到其他位置,例如“ C:\ Users \ User_name \ Documents \ R \ win-library \ 3.xx”,这可能是导致此问题的原因

在这种情况下,请从该特定位置复制arules文件夹,然后将其添加到R_HOME目录的库文件夹中。尝试这种方法,看看您是否能够解决问题。

答案 1 :(得分:0)

现在到最后一个发现,在python中没有类似的东西,但是,有一种方法可以使用read.transactions

groceries <- read.transactions("groceries.csv", sep = ",")
> summary(groceries)
transactions as itemMatrix in sparse format with
9835 rows (elements/itemsets/transactions) and
169 columns (items) and a density of 0.02609146

Python Jupyter笔记本

1)将数据导入为

import requests
url = 'https://raw.githubusercontent.com/stedy/Machine-Learning-with-R-datasets/master/groceries.csv'
grocery_dataset = requests.get(url)
# Save string as txt file
f = open('grocery_dataset.txt','w')
f.write(grocery_dataset.text)
f.close()

2)分离数据并根据需要进行调整

import csv
grocery_items = set()
with open("grocery_dataset.txt") as f:
    reader = csv.reader(f, delimiter=",")
    for i, line in enumerate(reader):
        grocery_items.update(line)
output_list = list()
with open("grocery_dataset.txt") as f:
    reader = csv.reader(f, delimiter=",")
    for i, line in enumerate(reader):
        row_val = {item:0 for item in grocery_items}
        row_val.update({item:1 for item in line})
        output_list.append(row_val)

4)将其保存为python中的Dataframe

import pandas as pd
grocery_df = pd.DataFrame(output_list)

因此

grocery_df.shape

将给出

(9835, 169)

表示R

summary(groceries)的行和列的行