Crawling¶

Contains functions for gathering metadata from individual DICOM files or entire directories.

dicom_csv.crawler.get_file_meta(path: Union[Path, str], force: bool = True, read_pixel_array: bool = False, unpack_volumetric: bool = False, extract_private: bool = False) → Iterable[dict][source]¶

Get a dict containing the metadata from the DICOM file located at path.

Parameters: PathLike (path -) – full path to file

:param : full path to file :param force - bool: pydicom.filereader.dcmread force parameter, default is False :param : pydicom.filereader.dcmread force parameter, default is False :param read_pixel_array - bool: if True, crawler will add information about DICOM pixel_array, False significantly increases crawling time,

default is True.

:paramif True, crawler will add information about DICOM pixel_array, False significantly increases crawling time,: default is True.

Notes

The following keys are added:

NoError: whether an exception was raised during reading the file.
HasPixelArray: (if NoError is True) whether the file contains a pixel array.
PixelArrayShape: (if HasPixelArray is True) the shape of the pixel array.

For some formats the following packages might be required:

>>> conda install -c glueviz gdcm # Python 3.5 and 3.6
>>> conda install -c conda-forge gdcm # Python 3.7

dicom_csv.crawler.join_tree(top: Union[Path, str], ignore_extensions: Sequence[str] = (), relative: bool = True, verbose: int = 0, read_pixel_array: bool = False, force: bool = True, unpack_volumetric: bool = True, extract_private: bool = False, total: bool = False) → DataFrame[source]¶

Returns a dataframe containing metadata for each file in all the subfolders of top.

Parameters

top (PathLike) – path to crawled folder
ignore_extensions (Sequence) – list of extensions to skip during crawling
relative (bool) – whether the PathToFolder attribute should be relative to top default is True.
verbose (int) –

the verbosity level:

0 - no progressbar

1 - progressbar with iterations count

2 - progressbar with filenames
total (bool) – whether to show the total number of files in the progressbar. This is adds a bit of overhead, because each file will be visited a second time (without being opened).

References

See the Working with DICOM files tutorial for more details.

Notes

The following columns are added:

NoError: whether an exception was raised during reading the file.
HasPixelArray:(if NoError is True) whether the file contains a pixel array(added if read_pixel_array is True).
PixelArrayShape: (if HasPixelArray is True) the shape of the pixel array (added if read_pixel_array is True).
PathToFolder
FileName

For some formats the following packages might be required:

>>> conda install -c glueviz gdcm # Python 3.5 and 3.6
>>> conda install -c conda-forge gdcm # Python 3.7

Aggregation¶

Tools for grouping DICOM metadata into images.

dicom_csv.aggregation.aggregate_images(metadata: DataFrame, by: Union[str, Sequence[str]], process_series: Optional[Callable] = None) → DataFrame[source]¶

Groups DICOM metadata into images (series).

Parameters

metadata – a dataframe with metadata returned by join_tree.
by – a list of column names by which the grouping will be performed. Default columns are: PatientID, SeriesInstanceUID, StudyInstanceUID, PathToFolder, PixelArrayShape, SequenceName.
process_series – a function that processes an aggregated series before it will be joined into a single entry

References

See the Working with DICOM files tutorial for more details.

Notes

The following columns are added:: SlicesCount: the number of files/slices in the image.

FileNames: a list of slash (“/”) separated file names.

InstanceNumbers: (if InstanceNumber is in columns) a list of comma separated InstanceNumber values.
The following columns are removed:: FileName (replaced by FileNames), InstanceNumber (replaced by InstanceNumbers), any other columns that differ from file to file.

dicom_csv.aggregation.normalize_identifiers(metadata: DataFrame) → DataFrame[source]¶

Converts PatientID to str and fills nan values in SequenceName.

Notes

The input dataframe will be mutated.

dicom_csv.aggregation.select(dataframe: DataFrame, query: str, **where: str) → DataFrame[source]¶

Loading¶

dicom_csv.misc.get_image(instance: Dataset, to_color_space: Optional[str] = None)[source]¶

dicom_csv.misc.stack_images(series: Sequence[Dataset], axis: int = -1, to_color_space: Optional[str] = None)[source]¶

Spatial operations¶

dicom_csv.spatial.get_orientation_matrix(series: Sequence[Dataset]) → ndarray[source]¶

Returns a 3 x 3 orthogonal transition matrix from the image-based basis to the patient-based basis. Rows are coordinates of image-based basis vectors in the patient-based basis. Columns are coordinates of patient-based basis vectors in the image-based basis vectors.

See https://dicom.innolitics.com/ciods/rt-dose/image-plane/00200037 for details.

dicom_csv.spatial.get_slice_plane(instance: Dataset) → Plane[source]¶

dicom_csv.spatial.get_slices_plane(series: Sequence[Dataset]) → Plane[source]¶

class dicom_csv.spatial.Plane(value)[source]¶

Bases: Enum

An enumeration.

Sagittal = 0¶

Coronal = 1¶

Axial = 2¶

dicom_csv.spatial.order_series(series: Sequence[Dataset], decreasing: bool = True) → Sequence[Dataset][source]¶

dicom_csv.spatial.get_slice_locations(series: Sequence[Dataset]) → ndarray[source]¶: Computes slices location from ImagePositionPatient. NOTE: the order of slice locations can be both increasing or decreasing for ordered series (see order_series).

dicom_csv.spatial.locations_to_spacing(locations: Sequence[float], max_delta: float = 0.1, errors: bool = True) → float[source]¶

dicom_csv.spatial.get_slice_spacing(series: Sequence[Dataset], max_delta: float = 0.1, errors: bool = True) → float[source]¶: Returns constant distance between slices of a series. If the series doesn’t have constant spacing - raises ValueError if errors is True, returns np.nan otherwise.

dicom_csv.spatial.get_pixel_spacing(series: Sequence[Dataset]) → Tuple[float, float][source]¶: Returns pixel spacing (two numbers) in mm.

dicom_csv.spatial.get_voxel_spacing(series: Sequence[Dataset]) → Tuple[float, float, float][source]¶: Returns voxel spacing: pixel spacing and distance between slices’ centers.

dicom_csv.spatial.get_image_position_patient(series: Sequence[Dataset]) → ndarray[source]¶: Returns ImagePositionPatient stacked into array.

dicom_csv.spatial.drop_duplicated_slices(series: Sequence[Dataset], tolerance_hu=1) → Sequence[Dataset][source]¶

dicom_csv.spatial.orientation_matrix_to_slices_plane(om: ndarray) → Plane[source]¶

dicom_csv.spatial.get_slice_orientation(*args, **kwds)¶: get_slice_orientation is deprecated!

dicom_csv.spatial.get_slices_orientation(series: Sequence[Dataset]) → SlicesOrientation¶: get_slices_orientation is deprecated!

class dicom_csv.spatial.SlicesOrientation(transpose: bool, flip_axes: tuple)[source]¶

Bases: tuple

Defines how slices should be transformed in order to be canonically oriented: First transpose slices if transpose == True. Then flip slices along flip_axes (they already account for transposition).

property transpose¶: Alias for field number 0

property flip_axes¶: Alias for field number 1

dicom_csv.spatial.orientation_matrix_to_slices_orientation(*args, **kwds)¶: orientation_matrix_to_slices_orientation is deprecated!

dicom_csv.spatial.get_axes_permutation(*args, **kwds)¶: get_axes_permutation is deprecated!

dicom_csv.spatial.get_flipped_axes(*args, **kwargs)¶: <lambda> is deprecated!

dicom_csv.spatial.get_image_plane(series: Sequence[Dataset]) → Plane¶: get_image_plane is deprecated!

dicom_csv.spatial.restore_orientation_matrix(metadata: Union[Series, DataFrame])[source]¶

Fills nan values (if possible) in metadata’s ImageOrientationPatient* rows.

Required columns: ImageOrientationPatient[0-5]

Notes

The input dataframe will be mutated.

Console scripts¶

This library contains a console script around join_tree, which is added to your namespace after installation:

dicom-csv folder/with/dicoms path/to/metadata.csv

# pass --help for more details:
dicom-csv --help