Query & search registries

Find & access data using registries.

Setup

!lamin init --storage ./mydata
Hide code cell output
💡 connected lamindb: testuser1/mydata
import lamindb as ln

ln.settings.verbosity = "info"
💡 connected lamindb: testuser1/mydata

We’ll need some toy data:

ln.Artifact(ln.core.datasets.file_jpg_paradisi05(), description="My image").save()
ln.Artifact.from_df(ln.core.datasets.df_iris(), description="The iris collection").save()
ln.Artifact(ln.core.datasets.file_fastq(), description="My fastq").save()
Hide code cell output
❗ no run & transform get linked, consider calling ln.track()
✅ storing artifact 'SY9mhMPL62txVRF1RYiW' at '/home/runner/work/lamindb/lamindb/docs/mydata/.lamindb/SY9mhMPL62txVRF1RYiW.jpg'
❗ no run & transform get linked, consider calling ln.track()
✅ storing artifact '1F3d7IJc7yrs0YfU9mSj' at '/home/runner/work/lamindb/lamindb/docs/mydata/.lamindb/1F3d7IJc7yrs0YfU9mSj.parquet'
❗ no run & transform get linked, consider calling ln.track()
✅ storing artifact 'VhM2IhLcVppFdsQHwdYE' at '/home/runner/work/lamindb/lamindb/docs/mydata/.lamindb/VhM2IhLcVppFdsQHwdYE.fastq.gz'
Artifact(updated_at=2024-05-23 10:57:29 UTC, uid='VhM2IhLcVppFdsQHwdYE', suffix='.fastq.gz', description='My fastq', size=20, hash='hi7ZmAzz8sfMd3vIQr-57Q', hash_type='md5', visibility=1, key_is_virtual=True, created_by_id=1, storage_id=1)

Look up metadata

For entities where we don’t store more than 100k records, a look up object can be a convenient way of selecting a record.

Consider the User registry:

users = ln.User.lookup(field="handle")

With auto-complete, we find a user:

user = users.testuser1
user
User(uid='DzTjkKse', handle='testuser1', name='Test User1', updated_at=2024-05-23 10:57:27 UTC)

Note

You can also auto-complete in a dictionary:

users_dict = ln.User.lookup().dict()

Filter by metadata

Filter for all artifacts created by a user:

ln.Artifact.filter(created_by=user).df()
version created_at created_by_id updated_at uid storage_id key suffix accessor description size hash hash_type n_objects n_observations transform_id run_id visibility key_is_virtual
id
1 None 2024-05-23 10:57:29.104272+00:00 1 2024-05-23 10:57:29.104352+00:00 SY9mhMPL62txVRF1RYiW 1 None .jpg None My image 29358 r4tnqmKI_SjrkdLzpuWp4g md5 None None None None 1 True
2 None 2024-05-23 10:57:29.267073+00:00 1 2024-05-23 10:57:29.267132+00:00 1F3d7IJc7yrs0YfU9mSj 1 None .parquet DataFrame The iris collection 5629 ah24lV9Ncc8nPL0MumEsdw md5 None None None None 1 True
3 None 2024-05-23 10:57:29.275050+00:00 1 2024-05-23 10:57:29.275097+00:00 VhM2IhLcVppFdsQHwdYE 1 None .fastq.gz None My fastq 20 hi7ZmAzz8sfMd3vIQr-57Q md5 None None None None 1 True

To access the results encoded in a filter statement, execute its return value with one of:

  • .df(): A pandas DataFrame with each record stored as a row.

  • .all(): An indexable django QuerySet.

  • .list(): A list of records.

  • .one(): Exactly one record. Will raise an error if there is none.

  • .one_or_none(): Either one record or None if there is no query result.

Note

filter() returns a QuerySet.

The ORMs in LaminDB are Django Models and any Django query works. LaminDB extends Django’s API for data scientists.

Under the hood, any .filter() call translates into a SQL select statement.

.one() and .one_or_none() are two parts of LaminDB’s API that are borrowed from SQLAlchemy.

Search for metadata

ln.Artifact.search("iris").df()
version created_at created_by_id updated_at uid storage_id key suffix accessor description size hash hash_type n_objects n_observations transform_id run_id visibility key_is_virtual
id
2 None 2024-05-23 10:57:29.267073+00:00 1 2024-05-23 10:57:29.267132+00:00 1F3d7IJc7yrs0YfU9mSj 1 None .parquet DataFrame The iris collection 5629 ah24lV9Ncc8nPL0MumEsdw md5 None None None None 1 True

Let us create 500 notebook objects with fake titles and save them:

ln.save(
    [
        ln.Transform(name=title, type="notebook")
        for title in ln.core.datasets.fake_bio_notebook_titles(n=500)
    ]
)

We can now search for any combination of terms:

ln.Transform.search("intestine").df().head()
version uid name key description type latest_report_id source_code_id reference reference_type created_at updated_at created_by_id
id
21 None M3ELiJmoRbQE Ige intestine IgG3 investigate. None None notebook None None None None 2024-05-23 10:57:30.374039+00:00 2024-05-23 10:57:30.374054+00:00 1
22 None NHh5sXxKxnFI Von Ebner'S Gland Cardiac muscle cell Von Ebne... None None notebook None None None None 2024-05-23 10:57:30.374193+00:00 2024-05-23 10:57:30.374207+00:00 1
25 None 0vzGM30blZfX Lungs IgA intestine rank Bowman's gland IgY Me... None None notebook None None None None 2024-05-23 10:57:30.374654+00:00 2024-05-23 10:57:30.374669+00:00 1
52 None 21DcRX0Q3BIV Intestine IgD result IgG1 study Bladder IgG. None None notebook None None None None 2024-05-23 10:57:30.378755+00:00 2024-05-23 10:57:30.378768+00:00 1
54 None zDGq4lcqCuip Renshaw Cells IgD IgG1 Cardiac muscle cell Car... None None notebook None None None None 2024-05-23 10:57:30.379056+00:00 2024-05-23 10:57:30.379072+00:00 1

Leverage relations

Django has a double-under-score syntax to filter based on related tables.

This syntax enables you to traverse several layers of relations:

ln.Artifact.filter(run__created_by__handle__startswith="testuse").df()
version created_at updated_at uid key suffix accessor description size hash hash_type n_objects n_observations visibility key_is_virtual created_by_id storage_id transform_id run_id
id

The filter selects all artifacts based on the users who ran the generating notebook.

(Under the hood, in the SQL database, it’s joining the artifact table with the run and the user table.)

Beyond __startswith, Django supports about two dozen field comparators field__comparator=value.

Here are some of them.

and

ln.Artifact.filter(suffix=".jpg", created_by=user).df()
version created_at created_by_id updated_at uid storage_id key suffix accessor description size hash hash_type n_objects n_observations transform_id run_id visibility key_is_virtual
id
1 None 2024-05-23 10:57:29.104272+00:00 1 2024-05-23 10:57:29.104352+00:00 SY9mhMPL62txVRF1RYiW 1 None .jpg None My image 29358 r4tnqmKI_SjrkdLzpuWp4g md5 None None None None 1 True

less than/ greater than

Or subset to artifacts greater than 10kB. Here, we can’t use keyword arguments, but need an explicit where statement.

ln.Artifact.filter(created_by=user, size__lt=1e4).df()
version created_at created_by_id updated_at uid storage_id key suffix accessor description size hash hash_type n_objects n_observations transform_id run_id visibility key_is_virtual
id
2 None 2024-05-23 10:57:29.267073+00:00 1 2024-05-23 10:57:29.267132+00:00 1F3d7IJc7yrs0YfU9mSj 1 None .parquet DataFrame The iris collection 5629 ah24lV9Ncc8nPL0MumEsdw md5 None None None None 1 True
3 None 2024-05-23 10:57:29.275050+00:00 1 2024-05-23 10:57:29.275097+00:00 VhM2IhLcVppFdsQHwdYE 1 None .fastq.gz None My fastq 20 hi7ZmAzz8sfMd3vIQr-57Q md5 None None None None 1 True

or

from django.db.models import Q

ln.Artifact.filter().filter(Q(suffix=".jpg") | Q(suffix=".fastq.gz")).df()
version created_at created_by_id updated_at uid storage_id key suffix accessor description size hash hash_type n_objects n_observations transform_id run_id visibility key_is_virtual
id
1 None 2024-05-23 10:57:29.104272+00:00 1 2024-05-23 10:57:29.104352+00:00 SY9mhMPL62txVRF1RYiW 1 None .jpg None My image 29358 r4tnqmKI_SjrkdLzpuWp4g md5 None None None None 1 True
3 None 2024-05-23 10:57:29.275050+00:00 1 2024-05-23 10:57:29.275097+00:00 VhM2IhLcVppFdsQHwdYE 1 None .fastq.gz None My fastq 20 hi7ZmAzz8sfMd3vIQr-57Q md5 None None None None 1 True

in

ln.Artifact.filter(suffix__in=[".jpg", ".fastq.gz"]).df()
version created_at created_by_id updated_at uid storage_id key suffix accessor description size hash hash_type n_objects n_observations transform_id run_id visibility key_is_virtual
id
1 None 2024-05-23 10:57:29.104272+00:00 1 2024-05-23 10:57:29.104352+00:00 SY9mhMPL62txVRF1RYiW 1 None .jpg None My image 29358 r4tnqmKI_SjrkdLzpuWp4g md5 None None None None 1 True
3 None 2024-05-23 10:57:29.275050+00:00 1 2024-05-23 10:57:29.275097+00:00 VhM2IhLcVppFdsQHwdYE 1 None .fastq.gz None My fastq 20 hi7ZmAzz8sfMd3vIQr-57Q md5 None None None None 1 True

order by

ln.Artifact.filter().order_by("-updated_at").df()
version created_at created_by_id updated_at uid storage_id key suffix accessor description size hash hash_type n_objects n_observations transform_id run_id visibility key_is_virtual
id
3 None 2024-05-23 10:57:29.275050+00:00 1 2024-05-23 10:57:29.275097+00:00 VhM2IhLcVppFdsQHwdYE 1 None .fastq.gz None My fastq 20 hi7ZmAzz8sfMd3vIQr-57Q md5 None None None None 1 True
2 None 2024-05-23 10:57:29.267073+00:00 1 2024-05-23 10:57:29.267132+00:00 1F3d7IJc7yrs0YfU9mSj 1 None .parquet DataFrame The iris collection 5629 ah24lV9Ncc8nPL0MumEsdw md5 None None None None 1 True
1 None 2024-05-23 10:57:29.104272+00:00 1 2024-05-23 10:57:29.104352+00:00 SY9mhMPL62txVRF1RYiW 1 None .jpg None My image 29358 r4tnqmKI_SjrkdLzpuWp4g md5 None None None None 1 True

contains

ln.Transform.filter(name__contains="search").df().head(10)
version uid name key description type latest_report_id source_code_id reference reference_type created_at updated_at created_by_id
id
17 None ObM5lARfz2EU Igg Helper T cell IgE IgD research IgD. None None notebook None None None None 2024-05-23 10:57:30.373324+00:00 2024-05-23 10:57:30.373338+00:00 1
18 None ONhFsDTxeBlg Research IgY IgA cluster Lungs rank IgD. None None notebook None None None None 2024-05-23 10:57:30.373477+00:00 2024-05-23 10:57:30.373491+00:00 1
28 None jDk6lMwdNzLH Research IgG2 Cardiac muscle cell IgA IgD. None None notebook None None None None 2024-05-23 10:57:30.375111+00:00 2024-05-23 10:57:30.375124+00:00 1
36 None emaGDtDUE4rd Von Ebner'S Gland IgD rank research. None None notebook None None None None 2024-05-23 10:57:30.376357+00:00 2024-05-23 10:57:30.376370+00:00 1
46 None 5JGlkWD0eKiW Endothelial Cells IgG4 IgG2 efficiency IgE can... None None notebook None None None None 2024-05-23 10:57:30.377858+00:00 2024-05-23 10:57:30.377871+00:00 1
65 None 12wTKmRh5I3w Research intestine IgG3 IgD IgM classify IgG2. None None notebook None None None None 2024-05-23 10:57:30.380744+00:00 2024-05-23 10:57:30.380758+00:00 1
71 None 6bHnIKpR26a0 Research rank visualize IgG4. None None notebook None None None None 2024-05-23 10:57:30.381644+00:00 2024-05-23 10:57:30.381657+00:00 1
98 None cfUQHe1yT3NA Cardiac Muscle Cell IgD study basal cell resea... None None notebook None None None None 2024-05-23 10:57:30.389186+00:00 2024-05-23 10:57:30.389202+00:00 1
103 None lgfPa6yIvfov Study Pineal gland Von Ebner's gland research ... None None notebook None None None None 2024-05-23 10:57:30.389943+00:00 2024-05-23 10:57:30.389958+00:00 1
104 None TyD1KGDXc5xl Research Lungs Cardiac muscle cell research Zo... None None notebook None None None None 2024-05-23 10:57:30.390096+00:00 2024-05-23 10:57:30.390109+00:00 1

And case-insensitive:

ln.Transform.filter(name__icontains="Search").df().head(10)
version uid name key description type latest_report_id source_code_id reference reference_type created_at updated_at created_by_id
id
17 None ObM5lARfz2EU Igg Helper T cell IgE IgD research IgD. None None notebook None None None None 2024-05-23 10:57:30.373324+00:00 2024-05-23 10:57:30.373338+00:00 1
18 None ONhFsDTxeBlg Research IgY IgA cluster Lungs rank IgD. None None notebook None None None None 2024-05-23 10:57:30.373477+00:00 2024-05-23 10:57:30.373491+00:00 1
28 None jDk6lMwdNzLH Research IgG2 Cardiac muscle cell IgA IgD. None None notebook None None None None 2024-05-23 10:57:30.375111+00:00 2024-05-23 10:57:30.375124+00:00 1
36 None emaGDtDUE4rd Von Ebner'S Gland IgD rank research. None None notebook None None None None 2024-05-23 10:57:30.376357+00:00 2024-05-23 10:57:30.376370+00:00 1
46 None 5JGlkWD0eKiW Endothelial Cells IgG4 IgG2 efficiency IgE can... None None notebook None None None None 2024-05-23 10:57:30.377858+00:00 2024-05-23 10:57:30.377871+00:00 1
65 None 12wTKmRh5I3w Research intestine IgG3 IgD IgM classify IgG2. None None notebook None None None None 2024-05-23 10:57:30.380744+00:00 2024-05-23 10:57:30.380758+00:00 1
71 None 6bHnIKpR26a0 Research rank visualize IgG4. None None notebook None None None None 2024-05-23 10:57:30.381644+00:00 2024-05-23 10:57:30.381657+00:00 1
98 None cfUQHe1yT3NA Cardiac Muscle Cell IgD study basal cell resea... None None notebook None None None None 2024-05-23 10:57:30.389186+00:00 2024-05-23 10:57:30.389202+00:00 1
103 None lgfPa6yIvfov Study Pineal gland Von Ebner's gland research ... None None notebook None None None None 2024-05-23 10:57:30.389943+00:00 2024-05-23 10:57:30.389958+00:00 1
104 None TyD1KGDXc5xl Research Lungs Cardiac muscle cell research Zo... None None notebook None None None None 2024-05-23 10:57:30.390096+00:00 2024-05-23 10:57:30.390109+00:00 1

startswith

ln.Transform.filter(name__startswith="Research").df()
version uid name key description type latest_report_id source_code_id reference reference_type created_at updated_at created_by_id
id
18 None ONhFsDTxeBlg Research IgY IgA cluster Lungs rank IgD. None None notebook None None None None 2024-05-23 10:57:30.373477+00:00 2024-05-23 10:57:30.373491+00:00 1
28 None jDk6lMwdNzLH Research IgG2 Cardiac muscle cell IgA IgD. None None notebook None None None None 2024-05-23 10:57:30.375111+00:00 2024-05-23 10:57:30.375124+00:00 1
65 None 12wTKmRh5I3w Research intestine IgG3 IgD IgM classify IgG2. None None notebook None None None None 2024-05-23 10:57:30.380744+00:00 2024-05-23 10:57:30.380758+00:00 1
71 None 6bHnIKpR26a0 Research rank visualize IgG4. None None notebook None None None None 2024-05-23 10:57:30.381644+00:00 2024-05-23 10:57:30.381657+00:00 1
104 None TyD1KGDXc5xl Research Lungs Cardiac muscle cell research Zo... None None notebook None None None None 2024-05-23 10:57:30.390096+00:00 2024-05-23 10:57:30.390109+00:00 1
166 None VMMMIPPD1zu3 Research IgY IgG Mesangial cell IgG IgG3 effic... None None notebook None None None None 2024-05-23 10:57:30.402903+00:00 2024-05-23 10:57:30.402916+00:00 1
234 None QyWeA7DfXqyO Research IgG3 Zona reticularis classify intest... None None notebook None None None None 2024-05-23 10:57:30.415623+00:00 2024-05-23 10:57:30.415637+00:00 1
242 None FaVrw76Bbm9N Research IgD efficiency. None None notebook None None None None 2024-05-23 10:57:30.416810+00:00 2024-05-23 10:57:30.416823+00:00 1
313 None aPg0GsnKW2Ge Research research IgG2 visualize IgE IgD. None None notebook None None None None 2024-05-23 10:57:30.430206+00:00 2024-05-23 10:57:30.430220+00:00 1
319 None ye7nomXJE3wR Research rank Taste buds Pineal gland IgG2 can... None None notebook None None None None 2024-05-23 10:57:30.431097+00:00 2024-05-23 10:57:30.431110+00:00 1
441 None OvC77gI4ITGM Research Bowman's gland IgG4 efficiency basal ... None None notebook None None None None 2024-05-23 10:57:30.452020+00:00 2024-05-23 10:57:30.452034+00:00 1
Hide code cell content
# clean up test instance
!lamin delete --force mydata
!rm -r mydata
Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.11.9/x64/bin/lamin", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/rich_click/rich_command.py", line 367, in __call__
    return super().__call__(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/rich_click/rich_command.py", line 152, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/lamin_cli/__main__.py", line 103, in delete
    return delete(instance, force=force)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/lamindb_setup/_delete.py", line 98, in delete
    n_objects = check_storage_is_empty(
                ^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/lamindb_setup/core/upath.py", line 760, in check_storage_is_empty
    raise InstanceNotEmpty(message)
lamindb_setup.core.upath.InstanceNotEmpty: Storage /home/runner/work/lamindb/lamindb/docs/mydata/.lamindb contains 3 objects ('_is_initialized' ignored) - delete them prior to deleting the instance
['/home/runner/work/lamindb/lamindb/docs/mydata/.lamindb/1F3d7IJc7yrs0YfU9mSj.parquet', '/home/runner/work/lamindb/lamindb/docs/mydata/.lamindb/SY9mhMPL62txVRF1RYiW.jpg', '/home/runner/work/lamindb/lamindb/docs/mydata/.lamindb/VhM2IhLcVppFdsQHwdYE.fastq.gz', '/home/runner/work/lamindb/lamindb/docs/mydata/.lamindb/_is_initialized']