src.internal.data_connector.connector module#
Provides a base class for all dataset connectors to inherit from.
- class src.internal.data_connector.connector.DatasetConnector[source]#
Bases:
ABC
Base class for dataset connectors.
- _abc_impl = <_abc._abc_data object>#
- abstract add_files(path: Union[str, Path], recursive: bool = True) None [source]#
Add files to a dataset.
- Parameters:
path (Union[str, Path]) – Local/remote path to data you want to add
recursive (bool, optional) – Recursively add sub-files/folders?. Defaults to True.
- Raises:
NotImplementedError – If dataset connector does not implement this method.
- abstract property artifacts: List[Artifact]#
Get a list of artifacts in the dataset.
- Raises:
NotImplementedError – If dataset connector does not implement this method.
- Returns:
List of artifacts in the dataset
- Return type:
List[Artifact]
- abstract classmethod create(name: str, version: str = 'latest') DatasetConnector [source]#
Create a dataset from scratch. Note that user will have to add files after the dataset has been created.
- Parameters:
name (str) – Name of the dataset
version (str, optional) – Version to give dataset. Defaults to “latest”.
- Returns:
Created dataset
- Return type:
- abstract delete() None [source]#
Delete dataset.
- Raises:
NotImplementedError – If dataset connector does not implement this method.
- abstract download(path: Union[str, Path], overwrite: bool = True) str [source]#
Downloads mutable copy of entire dataset.
- Parameters:
path (Union[str, Path]) – Target folder to download dataset to
overwrite (bool, optional) – If existing files in target folder should be removed. Defaults to True.
- Raises:
NotImplementedError – If dataset connector does not implement this method.
- Returns:
File path to downloaded dataset
- Return type:
str
- abstract property file_entries: Dict#
Get a dictionary of files in the dataset.
- Raises:
NotImplementedError – If dataset connector does not implement this method.
- Returns:
Dictionary of files in the dataset
- Return type:
Dict
- abstract classmethod get() DatasetConnector [source]#
Get an existing dataset, but do not download contents of dataset.
This method should return a properly initialized DatasetConnector with the .dataset attribute set.
- Returns:
Created dataset
- Return type:
- abstract static list_datasets() List[Dict] [source]#
- Obtain a list of all datasets, based on
what is available to the dataset connector.
- Raises:
NotImplementedError – If dataset connector does not implement this method.
- Returns:
List of dictionaries containing dataset metadata
- Return type:
List[Dict]
- abstract property name: str#
Get the name of the dataset.
- Raises:
NotImplementedError – If dataset connector does not implement this method.
- Returns:
Name of the dataset
- Return type:
str
- abstract property project: str#
Get the project associated with the dataset.
- Raises:
NotImplementedError – If dataset connector does not implement this method.
- Returns:
Project associated with the dataset
- Return type:
str
- abstract remove_files(path: Union[str, Path], recursive: bool = True) None [source]#
Remove files from a dataset.
- Parameters:
path (Union[str, Path]) – Local/remote path to data you want to remove
recursive (bool, optional) – Recursively add sub-files/folders?. Defaults to True.
- Raises:
NotImplementedError – If dataset connector does not implement this method.
- abstract upload(remote: Optional[str] = None) None [source]#
Push changes to remote.
- Parameters:
remote (Optional[str]) – URL to push files to. If None, will use any pre-defined URL in the dataset. Defaults to None.
- Raises:
ValueError – If remote is not defined in arguments and dataset has no default remote, a ValueError should be raised
NotImplementedError – If dataset connector does not implement this method.