Question

我正在使用SWI-Prolog。

我有一个csv文件，其中顶行是探针，然后每行都是一个样本：

    1007_s_at   1053_at 117_at ...
GSM102447.CEL   1   0   0 ...
GSM102449.CEL   1   0   0 ...
GSM102451.CEL   1   0   0 ...
GSM102455.CEL   1   0   0 ...
GSM102507.CEL   1   0   1 ...
...

实际文件有超过20,000列（＆＃39;探针＆＃39;）和不超过150行（＆＃39;样本＆＃39;）。

我想提取每个关系并将它们作为事实打印在另一个文件中。

例如：

%probe_value_in_sample(Probe,Sample_Strip,ProbeValue).
probe_value_in_sample('1007_s_at','GSM102447',1).
etc

到目前为止我的代码：

foreach(csv_read_file_row_list('GSE2109_BarCode.csv', List), assert(['samples'|List])).

probe_value_in_sample(Probe,Sample_Strip,ProbeValue):-
[samples|[samples,Empty|ProbeList]],Empty='', %the first value is empty
indexOf(ProbeList,Probe,IndexOfProbe),
[samples|[samples,Sample|SampleValues]],Sample\='',
nth0(IndexOfProbe,SampleValues,ProbeValue),
name(Sample, CharSample),
append(Char_Sample_Strip,".CEL",CharSample),
name(Sample_Strip,Char_Sample_Strip).

%IndexOf(MyList, MyElement, MyIndex).
indexOf([Element|_], Element, 0).
indexOf([_|Tail], Element, Index):-
indexOf(Tail, Element, Index1),
Index is Index1+1.

这似乎运作良好，但是不起作用，或者使用findall无法使用它。

知道可能是什么问题吗？

感谢您的帮助。

的更新 的

感谢您的回复。

我定义了：

csv_read_file_row_list(File, List,Functor):-
csv_read_file_row(File,Row,[functor(Functor)]),Row=..List.

所以我有一个打开的文件而不是一个流，而Functor变量目前是多余的。

我对你如何使用maplist感到困惑？我无法让它发挥作用。

我试过了：

:- dynamic samples/3.

csv_read_file_row_list(File, List,Functor):-
csv_read_file_row(File,Row,[functor(Functor)]),Row=..List.

prepare_db(File) :-
   ( nonvar(File) ; File = 'GSE2109_BarCode.csv' ),
   %open(File, read, S),
   csv_read_file_row_list(File,     ['thing',_Empty|ColKeys],'thing'),
 forall(csv_read_file_row_list(File,    ['thing',RowKeyDirty|Samples],'thing'),
    (   clean_rowkey(RowKeyDirty, RowKey),
        maplist(store_sample(RowKey), ColKeys, Samples)
    )).
%close(S).

store_sample(RowKey, ColKey, Sample) :-
  assertz(samples(RowKey, ColKey, Sample)).

clean_rowkey(RowKeyDirty, RowKey) :- append(RowKey, ".CEL", RowKeyDirty).

以及：

:- dynamic samples/3.

csv_read_file_row_list(File, List,Functor):-
csv_read_file_row(File,Row,[functor(Functor)]),Row=..List.

prepare_db(File) :-
( nonvar(File) ; File = 'GSE2109_BarCode.csv' ),
%open(File, read, S),
csv_read_file_row_list(File, ['thing',_Empty|ColKeys],'thing'),
forall(csv_read_file_row_list(File, ['thing',RowKeyDirty|Samples],'thing'),
    (   clean_rowkey(RowKeyDirty, RowKey),
        maplist(store_sample,[RowKey], ColKeys, Samples)
    )).
%close(S).

store_sample(RowKey, ColKey, Sample) :-
assertz(samples(RowKey, ColKey, Sample)).

clean_rowkey(RowKeyDirty, RowKey) :- append(RowKey, ".CEL", RowKeyDirty).

但都失败了。

Answer 1

您没有以正确的方式使用assert / 1。 Prolog具有快速高效的内存DB，但是作为任何DB，必须正确编入索引。当然，与任何语言一样，每次都要避免重复相同的操作，但在准备数据库时格式化数据一次。

:- dynamic samples/3.

prepare_db(File) :-
    ( nonvar(File) ; File = 'GSE2109_BarCode.csv' ),
    open(File, read, S),
    read_row(S, [_Empty|ColKeys]),
    forall(read_row(S, [RowKeyDirty|Samples]),
        (   clean_rowkey(RowKeyDirty, RowKey),
            maplist(store_sample(RowKey), ColKeys, Samples)
        )),
    close(S).

store_sample(RowKey, ColKey, Sample) :-
    assertz(samples(RowKey, ColKey, Sample)).

clean_rowkey(RowKeyDirty, RowKey) :- append(RowKey, ".CEL", RowKeyDirty).

此代码假定第一行的非常列数与所有其他行相同。

read_row / 2必须获取一行并拆分代码列表列表，我猜你的csv_read_file_row_list / 2已经完成了，但我无法在发布的代码中发现你的定义。

索引更适合原子，而不是代码列表。 atom_codes / 2允许在这些表示之间切换。

修改

从您的评论和其他发布的代码中，我可以看到我的答案不太合适。这是经过修改和测试的代码段

:- [library(csv)]. :- dynamic samples/3. :- dynamic column_keys/1. prepare_db(File) :- retractall(column_keys(_)), retractall(samples(_,_,_)), ( nonvar(File) ; File = '/tmp/test.csv' ), forall(read_row(File, Row), store_row(Row)). store_row(Row) :- Row =.. [row|Cols], ( column_keys(ColKeys) -> Cols = [RowKeyDirty|Samples], clean_rowkey(RowKeyDirty, RowKey), maplist(store_sample(RowKey), ColKeys, Samples) ; assertz(column_keys(Cols)) ). store_sample(RowKey, ColKey, Sample) :- assertz(samples(RowKey, ColKey, Sample)). clean_rowkey(RowKeyDirty, RowKey) :- atom_concat(RowKey, '.CEL', RowKeyDirty). read_row(File, Row) :- csv_read_file_row(File, Row, [separator(0' ), strip(true), convert(true)]), writeln(read_row(Row)).

适用于此测试文件

1007_s_at 1053_at 117_at GSM102447.CEL 1 0 0 GSM102449.CEL 1 0 0 GSM102451.CEL 1 0 0 GSM102455.CEL 1 0 0 GSM102507.CEL 1 0 1

和收益

?- prepare_db(_). read_row(row(1007_s_at,1053_at,117_at)) read_row(row(GSM102447.CEL,1,0,0)) read_row(row(GSM102449.CEL,1,0,0)) read_row(row(GSM102451.CEL,1,0,0)) read_row(row(GSM102455.CEL,1,0,0)) read_row(row(GSM102507.CEL,1,0,1)) true. 16 ?- samples(X,Y,Z). X = 'GSM102447', Y = '1007_s_at', Z = 1 ; X = 'GSM102447', Y = '1053_at', Z = 0 ; ...

当然，读取行的显示仅用于调试目的

Prolog，读取csv文件并生成谓词。找到所有

1 个答案: