Keep artifacts local in a cloud instanceΒΆ
If you want to default to keeping artifacts local in a cloud instance, enable keep_artifacts_local
.
Show code cell content
!lamin login testuser1
!lamin init --storage s3://lamindb-ci/keep-artifacts-local
β
logged in with email testuser1@lamin.ai (uid: DzTjkKse)
π‘ go to: https://lamin.ai/testuser1/keep-artifacts-local
β updating & unlocking cloud SQLite 's3://lamindb-ci/keep-artifacts-local/cc7f2489bf7251f79ff9ca8df7ac045b.lndb' of instance 'testuser1/keep-artifacts-local'
π‘ connected lamindb: testuser1/keep-artifacts-local
β locked instance (to unlock and push changes to the cloud SQLite file, call: lamin close)
import lamindb as ln
ln.settings.transform.stem_uid = "l9lFf83aPwRc"
ln.settings.transform.version = "1"
ln.track()
π‘ connected lamindb: testuser1/keep-artifacts-local
π‘ notebook imports: lamindb==0.72.1
π‘ saved: Transform(version='1', uid='l9lFf83aPwRc5zKv', name='Keep artifacts local in a cloud instance', key='keep-artifacts-local', type='notebook', updated_at=2024-05-23 10:57:55 UTC, created_by_id=1)
π‘ saved: Run(uid='r5TEOxR3LR9oGRVTGkhl', transform_id=1, created_by_id=1)
Show code cell content
# the setting should be enabled on lamin.ai
# we're temporarily setting it here only for testing purposes
ln.setup.settings.instance._keep_artifacts_local = True
You can register a managed local storage location as follows:
ln.settings.storage_local = "./my_storage_local"
π‘ defaulting to local storage: /home/runner/work/lamindb/lamindb/docs/faq/my_storage_local
Now, you have two storage locations: one in the S3 bucket, and the other locally.
ln.Storage.df()
Show code cell output
created_at | created_by_id | run_id | updated_at | uid | root | description | type | region | instance_uid | |
---|---|---|---|---|---|---|---|---|---|---|
id | ||||||||||
2 | 2024-05-23 10:57:56.797029+00:00 | 1 | None | 2024-05-23 10:57:56.797104+00:00 | 5bSRCMNhYm83 | /home/runner/work/lamindb/lamindb/docs/faq/my_... | None | local | None | 6uGWmLpZlNoJ |
1 | 2024-05-23 10:57:51.794321+00:00 | 1 | None | 2024-05-23 10:57:51.794420+00:00 | mM1cyfcOYxRl | s3://lamindb-ci/keep-artifacts-local | None | s3 | us-west-1 | 6uGWmLpZlNoJ |
Update storage descriptionΒΆ
You can add a description to the storage by using the description parameter:
storage_record = ln.Storage.filter(root=ln.settings.storage_local).one()
storage_record.description = "Files stored locally in site X on server Y for reason ABC"
storage_record.save()
ln.Storage.df()
created_at | created_by_id | run_id | updated_at | uid | root | description | type | region | instance_uid | |
---|---|---|---|---|---|---|---|---|---|---|
id | ||||||||||
2 | 2024-05-23 10:57:56.797029+00:00 | 1 | None | 2024-05-23 10:57:56.834686+00:00 | 5bSRCMNhYm83 | /home/runner/work/lamindb/lamindb/docs/faq/my_... | Files stored locally in site X on server Y for... | local | None | 6uGWmLpZlNoJ |
1 | 2024-05-23 10:57:51.794321+00:00 | 1 | None | 2024-05-23 10:57:51.794420+00:00 | mM1cyfcOYxRl | s3://lamindb-ci/keep-artifacts-local | None | s3 | us-west-1 | 6uGWmLpZlNoJ |
Use local storageΒΆ
If you save an artifact, by default, itβs stored in local storage.
original_filepath = ln.core.datasets.file_fcs()
artifact = ln.Artifact(original_filepath, description="My fcs file").save()
local_path = artifact.path
local_path
Show code cell output
PosixUPath('/home/runner/work/lamindb/lamindb/docs/faq/my_storage_local/.lamindb/lerLb7WqKARNIBBUTOl7.fcs')
Youβll see the .fcs
file named by the uid
in your .lamindb/
directory under ./my_storage_local/
:
ln.settings.storage_local.view_tree()
Show code cell output
1 sub-directory & 2 files with suffixes '', '.fcs'
/home/runner/work/lamindb/lamindb/docs/faq/my_storage_local
βββ .lamindb
βββ _is_initialized
βββ lerLb7WqKARNIBBUTOl7.fcs
Show code cell content
assert local_path.exists()
assert artifact.path.as_posix().startswith(ln.setup.settings.instance.storage_local.root.as_posix())
If youβd like to upload an artifact, you pass upload=True
to the save()
method.
artifact.save(upload=True)
Show code cell output
π‘ moved local artifact to cache: /home/runner/.cache/lamindb/lerLb7WqKARNIBBUTOl7.fcs
Artifact(updated_at=2024-05-23 10:57:57 UTC, uid='lerLb7WqKARNIBBUTOl7', suffix='.fcs', description='My fcs file', size=6785467, hash='KCEXRahJ-Ui9Y6nksQ8z1A', hash_type='md5', visibility=1, key_is_virtual=True, created_by_id=1, storage_id=1, transform_id=1, run_id=1)
You now see the artifact in the S3 bucket:
ln.setup.settings.storage.root.view_tree()
Show code cell output
2 sub-directories & 3 files with suffixes '', '.lndb', '.fcs'
s3://lamindb-ci/keep-artifacts-local
βββ cc7f2489bf7251f79ff9ca8df7ac045b.lndb
βββ .lamindb
βββ _is_initialized
βββ lerLb7WqKARNIBBUTOl7.fcs
βββ _exclusion
And itβs no longer present in local storage:
ln.setup.settings.instance.storage_local.root.view_tree()
Show code cell output
1 sub-directory & 1 files with suffixes ''
/home/runner/work/lamindb/lamindb/docs/faq/my_storage_local
βββ .lamindb
βββ _is_initialized
Show code cell content
assert artifact.path.exists()
assert not local_path.exists()
assert artifact.path.as_posix().startswith(ln.setup.settings.instance.storage.root.as_posix())
Direct uploadΒΆ
You can also directly upload a file by passing upload=True
:
filepath = ln.core.datasets.file_mini_csv()
artifact2 = ln.Artifact(filepath, description="My csv file").save(upload=True)
artifact2.path
Show code cell output
S3Path('s3://lamindb-ci/keep-artifacts-local/.lamindb/rWfD8TDRqZ5Tmp4H9Vs5.csv')
Now we have two files on S3:
ln.Artifact.df(include="storage__root")
Show code cell output
storage__root | version | created_at | created_by_id | updated_at | uid | storage_id | key | suffix | accessor | description | size | hash | hash_type | n_objects | n_observations | transform_id | run_id | visibility | key_is_virtual | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | ||||||||||||||||||||
2 | s3://lamindb-ci/keep-artifacts-local | None | 2024-05-23 10:57:58.100279+00:00 | 1 | 2024-05-23 10:57:58.100331+00:00 | rWfD8TDRqZ5Tmp4H9Vs5 | 1 | None | .csv | None | My csv file | 11 | z1LdF2qN4cN0M2sXrcW8aw | md5 | None | None | 1 | 1 | 1 | True |
1 | s3://lamindb-ci/keep-artifacts-local | None | 2024-05-23 10:57:57.667324+00:00 | 1 | 2024-05-23 10:57:57.705182+00:00 | lerLb7WqKARNIBBUTOl7 | 1 | None | .fcs | None | My fcs file | 6785467 | KCEXRahJ-Ui9Y6nksQ8z1A | md5 | None | None | 1 | 1 | 1 | True |
Show code cell content
assert artifact2.path.exists()
Pre-existing artifactsΒΆ
Assume we already have a file in our registered local storage location:
Show code cell source
file_in_local_storage = ln.core.datasets.file_bam()
file_in_local_storage.rename("./my_storage_local/output.bam")
ln.UPath("my_storage_local/").view_tree()
1 sub-directory & 2 files with suffixes '', '.bam'
/home/runner/work/lamindb/lamindb/docs/faq/my_storage_local
βββ .lamindb
β βββ _is_initialized
βββ output.bam
If we create an artifact from it, it remains where it is during saving:
my_existing_file = ln.Artifact("./my_storage_local/output.bam", description="my existing file").save()
ln.UPath("my_storage_local/").view_tree()
Show code cell output
1 sub-directory & 2 files with suffixes '', '.bam'
/home/runner/work/lamindb/lamindb/docs/faq/my_storage_local
βββ .lamindb
β βββ _is_initialized
βββ output.bam
The storage path of the artifact is constructed using key
because key_is_virtual=False
:
my_existing_file
Show code cell output
Artifact(updated_at=2024-05-23 10:57:58 UTC, uid='7oVGVZRlhs9CixdTBx37', key='output.bam', suffix='.bam', description='my existing file', size=18, hash='D2yxELM5U3VLeyvrwWUMUA', hash_type='md5', visibility=1, key_is_virtual=False, created_by_id=1, storage_id=2, transform_id=1, run_id=1)
However, if we decide to upload the artifact, weβll use the uid
for constructing the storage path and switch key_is_virtual=True
:
my_existing_file.save(upload=True)
Show code cell output
π‘ moved local artifact to cache: /home/runner/.cache/lamindb/output.bam
Artifact(updated_at=2024-05-23 10:57:58 UTC, uid='7oVGVZRlhs9CixdTBx37', key='output.bam', suffix='.bam', description='my existing file', size=18, hash='D2yxELM5U3VLeyvrwWUMUA', hash_type='md5', visibility=1, key_is_virtual=True, created_by_id=1, storage_id=1, transform_id=1, run_id=1)
Here is the remote path of the artifact:
my_existing_file.path
Show code cell output
S3Path('s3://lamindb-ci/keep-artifacts-local/.lamindb/7oVGVZRlhs9CixdTBx37.bam')
And here are the contents of the storage locations:
# the path on S3
ln.setup.settings.storage.root.view_tree()
# the local path
ln.setup.settings.instance.storage_local.root.view_tree()
Show code cell output
2 sub-directories & 5 files with suffixes '', '.lndb', '.csv', '.bam', '.fcs'
s3://lamindb-ci/keep-artifacts-local
βββ cc7f2489bf7251f79ff9ca8df7ac045b.lndb
βββ .lamindb
βββ 7oVGVZRlhs9CixdTBx37.bam
βββ _is_initialized
βββ lerLb7WqKARNIBBUTOl7.fcs
βββ rWfD8TDRqZ5Tmp4H9Vs5.csv
βββ _exclusion
1 sub-directory & 1 files with suffixes ''
/home/runner/work/lamindb/lamindb/docs/faq/my_storage_local
βββ .lamindb
βββ _is_initialized
Delete the test instanceΒΆ
Delete the artifacts:
artifact.delete(permanent=True)
artifact2.delete(permanent=True)
my_existing_file.delete(permanent=True)
Delete the instance:
ln.setup.delete("keep-artifacts-local", force=True)
Show code cell output
π‘ deleted storage record on hub eb35d80e9c4d5d02aab65a03c95bb70c
π‘ deleted storage record on hub c222e68048354a8a85fc6e8c00c1e1b4
π‘ deleted instance record on hub cc7f2489bf7251f79ff9ca8df7ac045b