cp
Copy storage files and directories between cloud and local storage.
Synopsis
usage: datachain cp [-h] [-v] [-q] [-r] [--team TEAM]
[--local] [--anon] [--update]
[--no-glob] [--force]
source_path destination_path
Description
This command copies files and directories between local and/or remote storage. The command can operate through Studio (default) or directly with local storage access.
Arguments
source_path
- Path to the source file or directory to copydestination_path
- Path to the destination file or directory to copy to
Options
-r
,-R
,--recursive
- Copy directories recursively--team TEAM
- Team name to copy storage contents to--local
- Copy data files from the cloud locally without Studio (Default: False)--anon
- Use anonymous access to storage (available only with --local)--update
- Update cached list of files for the sources (available only with --local)--no-glob
- Do not expand globs (such as * or ?) (available only with --local)--force
- Force creating files even if they already exist (available only with --local)-h
,--help
- Show the help message and exit-v
,--verbose
- Be verbose-q
,--quiet
- Be quiet
Copy Operations
The command supports two main modes of operation:
Studio Mode (Default)
When using Studio mode (default), the command copies files and directories through Studio using the configured credentials. This mode automatically determines the operation type based on the source and destination protocols, supporting four different copy scenarios.
Local Mode
When using --local
flag, the command operates directly with local storage access, bypassing Studio. This mode supports additional options like --anon
, --update
, --no-glob
, and --force
.
Supported Storage Protocols
The command supports the following storage protocols:
- Local file system: Direct paths (e.g., /path/to/directory
or ./relative/path
)
- AWS S3: s3://bucket-name/path
- Google Cloud Storage: gs://bucket-name/path
- Azure Blob Storage: az://container-name/path
Examples
Studio Mode Examples
The command automatically determines the operation type based on the source and destination protocols:
1. Local to Local (local path → local path)
Operation: Direct local file system copy - Uses the local filesystem's native copy operation - Fastest operation as no network transfer is involved - Supports both files and directories
2. Local to Remote (local path → s3://
, gs://
, az://
)
Operation: Upload to cloud storage
- Uploads local files/directories to remote storage
- Uses presigned URLs for secure uploads
- Supports S3 multipart form data for large files
- Requires --recursive
flag for directories
# Upload single file
datachain cp /path/to/file.txt s3://my-bucket/data/file.txt
# Upload directory recursively
datachain cp -r /path/to/directory s3://my-bucket/data/
3. Remote to Local (s3://
, gs://
, az://
→ local path)
Operation: Download from cloud storage - Downloads remote files/directories to local storage - Uses presigned download URLs - Automatically extracts filename if destination is a directory - Creates destination directory if it doesn't exist
# Download single file
datachain cp s3://my-bucket/data/file.txt /path/to/local/file.txt
# Download to directory (filename preserved)
datachain cp s3://my-bucket/data/file.txt /path/to/directory/
4. Remote to Remote (s3://
→ s3://
, gs://
→ gs://
, etc.)
Operation: Copy within cloud storage
- Copies files between locations in the same bucket
- Cannot copy between different buckets (same limitation as mv
)
- Uses Studio's internal copy operation
- Requires --recursive
flag for directories
# Copy within same bucket
datachain cp s3://my-bucket/data/file.txt s3://my-bucket/archive/file.txt
# Copy directory recursively
datachain cp -r s3://my-bucket/data/images s3://my-bucket/backup/images
Additional Studio Mode Examples
-
Copy with specific team:
-
Copy with verbose output:
Local Mode Examples
-
Copy files locally without Studio:
-
Copy with anonymous access:
-
Copy with force overwrite:
-
Copy with update and no glob expansion:
Limitations
- Cannot copy between different buckets: Remote-to-remote copies must be within the same bucket
Notes
- When using Studio mode, you must be authenticated with
datachain auth login
before using it - The
--local
mode bypasses Studio and operates directly with storage providers