Crawling

Contains functions for gathering metadata from individual DICOM files or entire directories.

dicom_csv.crawler.get_file_meta(path: Union[Path, str], force: bool = True, read_pixel_array: bool = False, unpack_volumetric: bool = False, extract_private: bool = False) Iterable[dict][source]

Get a dict containing the metadata from the DICOM file located at path.

Parameters

PathLike (path -) – full path to file

:param : full path to file :param force - bool: pydicom.filereader.dcmread force parameter, default is False :param : pydicom.filereader.dcmread force parameter, default is False :param read_pixel_array - bool: if True, crawler will add information about DICOM pixel_array, False significantly increases crawling time,

default is True.

:paramif True, crawler will add information about DICOM pixel_array, False significantly increases crawling time,

default is True.

Notes

The following keys are added:
NoError: whether an exception was raised during reading the file.
HasPixelArray: (if NoError is True) whether the file contains a pixel array.
PixelArrayShape: (if HasPixelArray is True) the shape of the pixel array.
For some formats the following packages might be required:
>>> conda install -c glueviz gdcm # Python 3.5 and 3.6
>>> conda install -c conda-forge gdcm # Python 3.7
dicom_csv.crawler.join_tree(top: Union[Path, str], ignore_extensions: Sequence[str] = (), relative: bool = True, verbose: int = 0, read_pixel_array: bool = False, force: bool = True, unpack_volumetric: bool = True, extract_private: bool = False, total: bool = False) DataFrame[source]

Returns a dataframe containing metadata for each file in all the subfolders of top.

Parameters
  • top (PathLike) – path to crawled folder

  • ignore_extensions (Sequence) – list of extensions to skip during crawling

  • relative (bool) – whether the PathToFolder attribute should be relative to top default is True.

  • verbose (int) –

    the verbosity level:
    0 - no progressbar
    1 - progressbar with iterations count
    2 - progressbar with filenames

  • total (bool) – whether to show the total number of files in the progressbar. This is adds a bit of overhead, because each file will be visited a second time (without being opened).

References

See the Working with DICOM files tutorial for more details.

Notes

The following columns are added:
NoError: whether an exception was raised during reading the file.
HasPixelArray:(if NoError is True) whether the file contains a pixel array(added if read_pixel_array is True).
PixelArrayShape: (if HasPixelArray is True) the shape of the pixel array (added if read_pixel_array is True).
PathToFolder
FileName
For some formats the following packages might be required:
>>> conda install -c glueviz gdcm # Python 3.5 and 3.6
>>> conda install -c conda-forge gdcm # Python 3.7

Aggregation

Tools for grouping DICOM metadata into images.

dicom_csv.aggregation.aggregate_images(metadata: DataFrame, by: Union[str, Sequence[str]], process_series: Optional[Callable] = None) DataFrame[source]

Groups DICOM metadata into images (series).

Parameters
  • metadata – a dataframe with metadata returned by join_tree.

  • by – a list of column names by which the grouping will be performed. Default columns are: PatientID, SeriesInstanceUID, StudyInstanceUID, PathToFolder, PixelArrayShape, SequenceName.

  • process_series – a function that processes an aggregated series before it will be joined into a single entry

References

See the Working with DICOM files tutorial for more details.

Notes

The following columns are added:
SlicesCount: the number of files/slices in the image.
FileNames: a list of slash (“/”) separated file names.
InstanceNumbers: (if InstanceNumber is in columns) a list of comma separated InstanceNumber values.
The following columns are removed:

FileName (replaced by FileNames), InstanceNumber (replaced by InstanceNumbers), any other columns that differ from file to file.

dicom_csv.aggregation.normalize_identifiers(metadata: DataFrame) DataFrame[source]

Converts PatientID to str and fills nan values in SequenceName.

Notes

The input dataframe will be mutated.

dicom_csv.aggregation.select(dataframe: DataFrame, query: str, **where: str) DataFrame[source]

Loading

dicom_csv.misc.get_image(instance: Dataset, to_color_space: Optional[str] = None)[source]
dicom_csv.misc.stack_images(series: Sequence[Dataset], axis: int = -1, to_color_space: Optional[str] = None)[source]

Spatial operations

dicom_csv.spatial.get_orientation_matrix(series: Sequence[Dataset]) ndarray[source]

Returns a 3 x 3 orthogonal transition matrix from the image-based basis to the patient-based basis. Rows are coordinates of image-based basis vectors in the patient-based basis. Columns are coordinates of patient-based basis vectors in the image-based basis vectors.

See https://dicom.innolitics.com/ciods/rt-dose/image-plane/00200037 for details.

dicom_csv.spatial.get_slice_plane(instance: Dataset) Plane[source]
dicom_csv.spatial.get_slices_plane(series: Sequence[Dataset]) Plane[source]
class dicom_csv.spatial.Plane(value)[source]

Bases: Enum

An enumeration.

Sagittal = 0
Coronal = 1
Axial = 2
dicom_csv.spatial.order_series(series: Sequence[Dataset], decreasing: bool = True) Sequence[Dataset][source]
dicom_csv.spatial.get_slice_locations(series: Sequence[Dataset]) ndarray[source]

Computes slices location from ImagePositionPatient. NOTE: the order of slice locations can be both increasing or decreasing for ordered series (see order_series).

dicom_csv.spatial.locations_to_spacing(locations: Sequence[float], max_delta: float = 0.1, errors: bool = True) float[source]
dicom_csv.spatial.get_slice_spacing(series: Sequence[Dataset], max_delta: float = 0.1, errors: bool = True) float[source]

Returns constant distance between slices of a series. If the series doesn’t have constant spacing - raises ValueError if errors is True, returns np.nan otherwise.

dicom_csv.spatial.get_pixel_spacing(series: Sequence[Dataset]) Tuple[float, float][source]

Returns pixel spacing (two numbers) in mm.

dicom_csv.spatial.get_voxel_spacing(series: Sequence[Dataset]) Tuple[float, float, float][source]

Returns voxel spacing: pixel spacing and distance between slices’ centers.

dicom_csv.spatial.get_image_position_patient(series: Sequence[Dataset]) ndarray[source]

Returns ImagePositionPatient stacked into array.

dicom_csv.spatial.drop_duplicated_slices(series: Sequence[Dataset], tolerance_hu=1) Sequence[Dataset][source]
dicom_csv.spatial.orientation_matrix_to_slices_plane(om: ndarray) Plane[source]
dicom_csv.spatial.get_slice_orientation(*args, **kwds)

get_slice_orientation is deprecated!

dicom_csv.spatial.get_slices_orientation(series: Sequence[Dataset]) SlicesOrientation

get_slices_orientation is deprecated!

class dicom_csv.spatial.SlicesOrientation(transpose: bool, flip_axes: tuple)[source]

Bases: tuple

Defines how slices should be transformed in order to be canonically oriented: First transpose slices if transpose == True. Then flip slices along flip_axes (they already account for transposition).

property transpose

Alias for field number 0

property flip_axes

Alias for field number 1

dicom_csv.spatial.orientation_matrix_to_slices_orientation(*args, **kwds)

orientation_matrix_to_slices_orientation is deprecated!

dicom_csv.spatial.get_axes_permutation(*args, **kwds)

get_axes_permutation is deprecated!

dicom_csv.spatial.get_flipped_axes(*args, **kwargs)

<lambda> is deprecated!

dicom_csv.spatial.get_image_plane(series: Sequence[Dataset]) Plane

get_image_plane is deprecated!

dicom_csv.spatial.restore_orientation_matrix(metadata: Union[Series, DataFrame])[source]

Fills nan values (if possible) in metadata’s ImageOrientationPatient* rows.

Required columns: ImageOrientationPatient[0-5]

Notes

The input dataframe will be mutated.

Console scripts

This library contains a console script around join_tree, which is added to your namespace after installation:

dicom-csv folder/with/dicoms path/to/metadata.csv

# pass --help for more details:
dicom-csv --help