阅读csv时如何解决Unicoder问题

时间:2018-11-02 07:28:43

标签: python csv

我对python完全陌生。我正在使用一个接受医学文本并使用名为pyConTextNLP

的分类器对其进行注释的软件包

它基本上需要一些自然语言的文本,在其中添加一些“修饰语”并对其进行分类,同时消除负面的发现。

我遇到的问题是如何将修饰符列表添加为csv或yaml文件。我一直在遵循基本的安装说明here

问题出在这里:

modifiers = itemData.get_items("https://raw.githubusercontent.com/chapmanbe/pyConTextNLP/master/KB/lexical_kb_05042016.yml")

itemData.get_items看起来已经不存在了,有一个名为itemData.get_fileobj()的函数。据我所知,这需要一个csv文件,并将该csv传递给函数markup.markItems(modifiers, mode="modifier"),该函数查看文本并“标记”原始文本中与修饰符匹配的所有概念。

尝试运行示例代码时遇到的错误是:

if not `item.getLiteral() in compiledRegExprs:`

这给了我错误:

AttributeError: 'UnicodeReader' object has no attribute 'getLiteral'

整个代码为here:但是我也在下面写下了

import networkx as nx
import pyConTextNLP.itemData as itemData
import pyConTextNLP.pyConTextGraph as pyConText


reports = [
    """IMPRESSION: Evaluation limited by lack of IV contrast; however, no evidence of
      bowel obstruction or mass identified within the abdomen or pelvis. Non-specific interstitial opacities and bronchiectasis seen at the right
     base, suggestive of post-inflammatory changes.""",
    """DIAGNOSIS: NO SIGNIFICANT PATHOLOGY

MICRO  These biopsies of large bowel mucosa show oedema of the lamina propriabut no architectural abnormality
There is no dysplasia or malignancy
There is no evidence of active inflammation
There is no increase in the inflammatory cell content of the lamina propria""" ,
    """IMPRESSION:
     1.  2.0 cm cyst of the right renal lower pole.  Otherwise, normal appearance
     of the right kidney with patent vasculature and no sonographic evidence of
     renal artery stenosis.
     2.  Surgically absent left kidney.""",
         """IMPRESSION: No definite pneumothorax""",
    """IMPRESSION:  New opacity at the left lower lobe consistent with pneumonia."""
]

modifiers = itemData.get_fileobj("/Applications/anaconda3/lib/python3.7/site-packages/pyConTextNLP-0.6.2.0-py3.7.egg/pyConTextNLP/CSV_Modifiers.csv")
targets = itemData.get_fileobj("/Applications/anaconda3/lib/python3.7/site-packages/pyConTextNLP-0.6.2.0-py3.7.egg/pyConTextNLP/CSV_targets.csv")



def markup_sentence(s, modifiers, targets, prune_inactive=True):
    """
    """
    markup = pyConText.ConTextMarkup()
    markup.setRawText(s)
    markup.cleanText()
    markup.markItems(modifiers, mode="modifier")
    markup.markItems(targets, mode="target")
    markup.pruneMarks()
    markup.dropMarks('Exclusion')
    # apply modifiers to any targets within the modifiers scope
    markup.applyModifiers()
    markup.pruneSelfModifyingRelationships()
    if prune_inactive:
        markup.dropInactiveModifiers()
    return markup

reports[3]


markup = pyConText.ConTextMarkup()


isinstance(markup,nx.DiGraph)

markup.setRawText(reports[4].lower())
print(markup)
print(len(markup.getRawText()))

markup.cleanText()
print(markup)
print(len(markup.getText()))

markup.markItems(modifiers, mode="modifier")
print(markup.nodes(data=True))
print(type(list(markup.nodes())[0]))

markup.markItems(targets, mode="target")

for node in markup.nodes(data=True):
    print(node)

    markup.pruneMarks()
for node in markup.nodes(data=True):
    print(node)

    print(markup.edges())

    markup.applyModifiers()
for edge in markup.edges():
    print(edge)

markItems功能在这里:

def markItems(self, items, mode="target"):
    """tags the sentence for a list of items
    items: a list of contextItems"""
    if not items:
        return
    for item in items:
        self.add_nodes_from(self.markItem(item, ConTextMode=mode), 
         category=mode)

问题是,如何获得代码以读取csv文件中的列表而不会引发此错误?

0 个答案:

没有答案