我正在预处理DICOM图像存储库以为其提供卷积神经网络,但是当我尝试读取存储库时,它引发了以下错误:
LookupError:未知编码:ISO 2022 IR 100
这是我使用的代码:
listoflists = []
list = []
for x in range(1, 10):
data_path = "/home/lorenzo_f/CT COLONOGRAPHY/1.3.6.1.4.1.9328.50.4.000%d" %x
output_path ="/home/lorenzo_f/output/"
subfolders = [f.path for f in os.scandir(data_path) if f.is_dir() ]
subfolder = [f.path for f in os.scandir(subfolders[0]) if f.is_dir() ]
list.append(load_scan(subfolder[0]))
list.append(load_scan(subfolder[1]))
listoflists.append((list))
使用功能load_scan
# Loop over the image files and store everything into a list.
def load_scan(path):
slices = [dicom.read_file(path + '/' + s) for s in os.listdir(path)]
#slices[0].SpecificCharacterSet = 'latin_1'
slices.sort(key = lambda x: int(x.InstanceNumber))
try:
slice_thickness = np.abs(slices[0].ImagePositionPatient[2] - slices[1].ImagePositionPatient[2])
except:
slice_thickness = np.abs(slices[0].SliceLocation - slices[1].SliceLocation)
for s in slices:
s.SliceThickness = slice_thickness
return slices
我找不到您要说的标签,这是我使用的代码以及输出的第一项
data_path = "/home/lorenzo_f/CT COLONOGRAPHY/1.3.6.1.4.1.9328.50.4.0010"
output_path ="/home/lorenzo_f/output/"
subfolders = [f.path for f in os.scandir(data_path) if f.is_dir() ]
subfolder = [f.path for f in os.scandir(subfolders[0]) if f.is_dir() ]
ds=load_scan(subfolder[0])
ds
(0008, 0008) Image Type CS: ['ORIGINAL', 'SECONDARY', 'AXIAL']
(0008, 0016) SOP Class UID UI: CT Image Storage
(0008, 0018) SOP Instance UID UI: 1.3.6.1.4.1.9328.50.4.9867
(0008, 0020) Study Date DA: '20000101'
(0008, 0021) Series Date DA: '20000101'
(0008, 0022) Acquisition Date DA: '20000101'
(0008, 0023) Content Date DA: '20000101'
(0008, 0030) Study Time TM: '091936'
(0008, 0032) Acquisition Time TM: '092131'
(0008, 0033) Content Time TM: '101416'
(0008, 0050) Accession Number SH: ''
(0008, 0060) Modality CS: 'CT'
(0008, 0070) Manufacturer LO: 'GE MEDICAL SYSTEMS'
(0008, 0080) Institution Name LO: ''
(0008, 0081) Institution Address ST: ''
(0008, 0090) Referring Physician's Name PN: 'xDONEx'
(0008, 1030) Study Description LO: 'CT COLONOGRAP C'
(0008, 103e) Series Description LO: 'CT COLONOGRAPHY'
(0008, 1048) Physician(s) of Record PN: ' '
(0008, 1090) Manufacturer's Model Name LO: 'LightSpeed16'
(0008, 1140) Referenced Image Sequence 0 item(s) ----
(0008, 2112) Source Image Sequence 0 item(s) ----
(0010, 0010) Patient's Name PN: '1.3.6.1.4.1.9328.50.4.0010'
(0010, 0020) Patient ID LO: '1.3.6.1.4.1.9328.50.4.0010'
(0010, 0030) Patient's Birth Date DA: ''
(0010, 0040) Patient's Sex CS: 'M'
(0010, 1000) Other Patient IDs LO: ''
(0010, 1010) Patient's Age AS: '068Y'
(0010, 21b0) Additional Patient History LT: 'COLON SCREENING'
(0010, 21c0) Pregnancy Status US: []
(0012, 0010) Clinical Trial Sponsor Name LO: ''
(0012, 0020) Clinical Trial Protocol ID LO: ''
(0012, 0021) Clinical Trial Protocol Name LO: ''
(0012, 0030) Clinical Trial Site ID LO: ''
(0012, 0031) Clinical Trial Site Name LO: ''
(0012, 0040) Clinical Trial Subject ID LO: ''
(0012, 0042) Clinical Trial Subject Reading ID LO: ''
(0013, 0010) Private Creator LO: 'CTP'
(0013, 1010) Private tag data UN: b'CT COLONOGRAPHY\x00'
(0013, 1013) Private tag data UN: b'70093008'
(0018, 0015) Body Part Examined CS: 'COLON'
(0018, 0022) Scan Options CS: 'HELICAL MODE'
(0018, 0050) Slice Thickness DS: '0.7999999999999989'
(0018, 0060) KVP DS: '120'
(0018, 0090) Data Collection Diameter DS: '500.000000'
(0018, 1020) Software Version(s) LO: 'LightSpeedverrel'
(0018, 1030) Protocol Name LO: '6.10 CT COLONOGRAPHY'
(0018, 1100) Reconstruction Diameter DS: '330.000000'
(0018, 1110) Distance Source to Detector DS: '949.075012'
(0018, 1111) Distance Source to Patient DS: '541.000000'
(0018, 1120) Gantry/Detector Tilt DS: '0.000000'
(0018, 1130) Table Height DS: '167.199997'
(0018, 1140) Rotation Direction CS: 'CW'
(0018, 1150) Exposure Time IS: '526'
(0018, 1151) X-Ray Tube Current IS: '140'
(0018, 1152) Exposure IS: '2286'
(0018, 1160) Filter Type SH: 'BODY FILTER'
(0018, 1170) Generator Power IS: '16800'
(0018, 1190) Focal Spot(s) DS: '0.700000'
(0018, 1200) Date of Last Calibration DA: ''
(0018, 1201) Time of Last Calibration TM: ''
(0018, 1210) Convolution Kernel SH: 'STANDARD'
(0018, 5100) Patient Position CS: 'FFS'
(0020, 000d) Study Instance UID UI: 1.3.6.1.4.1.9328.50.4.9864
(0020, 000e) Series Instance UID UI: 1.3.6.1.4.1.9328.50.4.9865
(0020, 0010) Study ID SH: '1'
(0020, 0011) Series Number IS: '102'
(0020, 0012) Acquisition Number IS: '1'
(0020, 0013) Instance Number IS: '1'
(0020, 0032) Image Position (Patient) DS: ['-165.000000', '-165.000000', '-8.335000']
(0020, 0037) Image Orientation (Patient) DS: ['1.000000', '0.000000', '0.000000', '0.000000', '1.000000', '0.000000']
(0020, 0052) Frame of Reference UID UI: 1.3.6.1.4.1.9328.50.4.9866
(0020, 1040) Position Reference Indicator LO: 'XY'
(0020, 1041) Slice Location DS: '-8.335000'
(0028, 0002) Samples per Pixel US: 1
(0028, 0004) Photometric Interpretation CS: 'MONOCHROME2'
(0028, 0010) Rows US: 512
(0028, 0011) Columns US: 512
(0028, 0030) Pixel Spacing DS: ['0.644531', '0.644531']
(0028, 0100) Bits Allocated US: 16
(0028, 0101) Bits Stored US: 16
(0028, 0102) High Bit US: 15
(0028, 0103) Pixel Representation US: 1
(0028, 0120) Pixel Padding Value SS: -2000
(0028, 1050) Window Center DS: '40'
(0028, 1051) Window Width DS: '400'
(0028, 1052) Rescale Intercept DS: '-1024'
(0028, 1053) Rescale Slope DS: '1'
(0040, a124) UID UI: ''
(0088, 0140) Storage Media File-set UID UI: ''
(3006, 0024) Referenced Frame of Reference UID UI: ''
(3006, 00c2) Related Frame of Reference UID UI: ''
(7fe0, 0010) Pixel Data OW: Array of 524288 bytes,```
答案 0 :(得分:0)
我认为您的问题与转让语法无关。错误消息表明特定字符集(0008,0005)的值为
ISO 2022 IR 100
仅在使用所谓的代码扩展技术的情况下才允许使用ISO 2022。也就是说,同一属性值可以包含从不同字符集获得的字符,并且使用特殊字节序列(在ISO 2022中定义)在它们之间进行切换。
作为参考,请参阅PS3.3,C.12.1.1.2
代码扩展技术相对难以处理,因此很少使用。实际上,这是我(可能)看到此类对象的第一种情况。这对我来说也很有趣-所以您介意分享创建此图像的制造商和设备吗?
您如何解决此问题?好问题。我不知道任何能够处理此字符串编码的(python)工具包-也许dcm4che可以做到这一点。
如果只想提取像素数据,则可以尝试将(0008,0005)的值更改为“ ISO_IR 100”。这可能会导致读取元数据(例如患者姓名或研究描述)时出现问题。但是与像素数据编码有关的所有属性都不受字符编码的影响,因此应该可以使用。