src.internal.data_connector.connector module#

Provides a base class for all dataset connectors to inherit from.

class src.internal.data_connector.connector.DatasetConnector[source]#

Bases: ABC

Base class for dataset connectors.

_abc_impl = <_abc._abc_data object>#
abstract add_files(path: Union[str, Path], recursive: bool = True) None[source]#

Add files to a dataset.

Parameters:
  • path (Union[str, Path]) – Local/remote path to data you want to add

  • recursive (bool, optional) – Recursively add sub-files/folders?. Defaults to True.

Raises:

NotImplementedError – If dataset connector does not implement this method.

abstract property artifacts: List[Artifact]#

Get a list of artifacts in the dataset.

Raises:

NotImplementedError – If dataset connector does not implement this method.

Returns:

List of artifacts in the dataset

Return type:

List[Artifact]

abstract classmethod create(name: str, version: str = 'latest') DatasetConnector[source]#

Create a dataset from scratch. Note that user will have to add files after the dataset has been created.

Parameters:
  • name (str) – Name of the dataset

  • version (str, optional) – Version to give dataset. Defaults to “latest”.

Returns:

Created dataset

Return type:

DatasetConnector

abstract delete() None[source]#

Delete dataset.

Raises:

NotImplementedError – If dataset connector does not implement this method.

abstract download(path: Union[str, Path], overwrite: bool = True) str[source]#

Downloads mutable copy of entire dataset.

Parameters:
  • path (Union[str, Path]) – Target folder to download dataset to

  • overwrite (bool, optional) – If existing files in target folder should be removed. Defaults to True.

Raises:

NotImplementedError – If dataset connector does not implement this method.

Returns:

File path to downloaded dataset

Return type:

str

abstract property file_entries: Dict#

Get a dictionary of files in the dataset.

Raises:

NotImplementedError – If dataset connector does not implement this method.

Returns:

Dictionary of files in the dataset

Return type:

Dict

abstract classmethod get() DatasetConnector[source]#

Get an existing dataset, but do not download contents of dataset.

This method should return a properly initialized DatasetConnector with the .dataset attribute set.

Returns:

Created dataset

Return type:

DatasetConnector

abstract static list_datasets() List[Dict][source]#
Obtain a list of all datasets, based on

what is available to the dataset connector.

Raises:

NotImplementedError – If dataset connector does not implement this method.

Returns:

List of dictionaries containing dataset metadata

Return type:

List[Dict]

abstract property name: str#

Get the name of the dataset.

Raises:

NotImplementedError – If dataset connector does not implement this method.

Returns:

Name of the dataset

Return type:

str

abstract property project: str#

Get the project associated with the dataset.

Raises:

NotImplementedError – If dataset connector does not implement this method.

Returns:

Project associated with the dataset

Return type:

str

abstract remove_files(path: Union[str, Path], recursive: bool = True) None[source]#

Remove files from a dataset.

Parameters:
  • path (Union[str, Path]) – Local/remote path to data you want to remove

  • recursive (bool, optional) – Recursively add sub-files/folders?. Defaults to True.

Raises:

NotImplementedError – If dataset connector does not implement this method.

abstract upload(remote: Optional[str] = None) None[source]#

Push changes to remote.

Parameters:

remote (Optional[str]) – URL to push files to. If None, will use any pre-defined URL in the dataset. Defaults to None.

Raises:
  • ValueError – If remote is not defined in arguments and dataset has no default remote, a ValueError should be raised

  • NotImplementedError – If dataset connector does not implement this method.