Question

row.txt.gz和matrix.txt.gz文件的顺序是相同的。我的目的是从dask数据框中提取一些来自＆＃39; row.txt.gz＆＃39;然后使用完全相同的索引从matrix.txt.gz中提取行。

# ROWS
rows = dd.read_csv('*.row.txt.gz', sep='\t', compression='gzip', blocksize=None)
# MTX
mx = dd.read_csv('*.matrix.txt.gz', sep='\t', compression='gzip', blocksize=None)
# query
query_row_file = 'query.txt.gz'
query = pd.read_table(query_row_file , dtype=object, delimiter='\t')
# extract the data from rows
rows_queried = rows[rows['inchikey'].isin(query.METID)]
# Use index from rows_queried for 'mx'. How?
mx_queried = mx[rows_queried.index]
mx_queried = mx_queried.compute()

我收到了以下错误，我在逻辑中遗漏了一些内容。刚开始使用dask，任何帮助都将非常感谢！

pandas / core / indexing.py＆＃34;，第1269行，在_convert_to_indexer中 .format（掩模= objarr [掩模]）） KeyError：＆＃34; Int64Index（[0,16,60,88,104,131,132,149,163,179,188,204,233,261，\ n 262,293]，\ n dtype =＆＃ 39; int64＆＃39;）不在索引＆＃34;

Dask：isin进一步使用索引到另一个dask数据帧

0 个答案: