Annotations help users search for and find data, and they are a powerful tool used to systematically group and/or describe things in Synapse.

Annotations are stored as key-value pairs in Synapse, where the key defines a particular aspect of your data (for example, species, assay, file format) and the value defines a variable that belongs to that category (mouse, RNAseq, .bam). You can use annotations to add additional information about a project, file, folder, table, or view. Annotations can be based on an existing ontology or controlled vocabulary, or can be created as needed and modified later as your metadata evolves.

For example, if you have uploaded a collection of alignment files in the BAM file format from an RNA-sequencing experiment, each representing a sample and experimental replicate, you can use annotations to surface this information in a structured way. Sometimes, users encode this information in file names, e.g., sampleA_conditionB.bam, which makes it “human-readable” but not searchable.

In this case, you may want to add annotations that look like this:

Annotation example

How to Assign Annotations

Annotations are either added during file upload or at a later date. You can add and edit annotations from the web or programmatically using the command line client, the Python client, the R client. Using the programmatic clients facilitates batching and automated population of annotations across many files. The web client can be used to bulk update many files using views.

Add and Edit Annotations

Web

To add or modify annotations annotations on projects, files, folders, or tables in the web client, click the Tools menu in the upper right corner and select Annotations.

Annotation web location

A new window will appear with a list of any previously added annotations. To add new annotations or edit existing annotations, click the Edit button.

In the pop-up window, add your annotations one at a time. Use the + icon to add multiple values for a single key and the x icon to remove values. Use the Add New Key button to add a new key.

To add annotations on multiple files, refer to Managing Custom Metadata at Scale for a tutorial on using views for annotation management.

Command line

To add annotations on a new file during upload:

synapse store sampleA_conditionB.bam --parentId syn00123 --annotations '{"fileFormat":"bam", "assay":"rnaSeq"}'
CODE


To add annotations on an existing file:

synapse set-annotations --id syn00123 --annotations '{"fileFormat":"bam", "assay":"rnaSeq"}'
CODE


Python

To add annotations on a new file during upload:

entity = File(path="sampleA_conditionB.bam",parent="syn00123")
entity.annotations = {"fileFormat":"bam", "assay":"rnaSeq"}
syn.store(entity)
PY


To modify annotations on an existing file:

entity = syn.get_annotations("syn123")

# set key 'fileFormat' to have value 'fastq'
entity['fileFormat'] = 'fastq'

syn.set_annotations(entity)
CODE


R

To add annotations on a new file during upload:

entity <- File("sampleA_conditionB.bam", parent="syn00123")
entity <- synStore(entity, annotations=list(fileFormat = "bam", assay = "rnaSeq"))
R


To modify annotations on an existing file:

entity <- synGet("syn00123")

##### Modify annotations and PRESERVE existing annotations
existing_annots <- synGetAnnotations(entity)
synSetAnnotations(entity, annotations = c(existing_annots, list(fileType = "bam", assay = "rnaSeq")))

##### Modify annotations and REMOVE existing annotations
synSetAnnotations(entity, annotations = list(fileType = "bam", assay = "rnaSeq"))
CODE

Queries

Queries in Synapse are SQL-like and you can query any table or view with an item’s synID. You will only be able to query the items that you currently have permission to access.

SELECT * FROM <synID> WHERE <expression>
CODE

The expressions are the conditions for limiting a search. Every entity has properties useful for searching:

  • All entities (projects, files, folders, tables/views, Docker containers): idnamecreatedOncreatedBymodifiedOnmodifiedByetagtypeparentIdbenefactorIdprojectId

  • Versionable entities (files, table/views): currentVersion

  • Files only: dataFileHandleId

Files also have contentMd5contentSize, and contentType as properties. These properties are not available in a view and are not searchable.

SELECT * FROM syn12345678 WHERE "id" = 'syn00012'
CODE

For a complete list of example queries, see:

SQL Query Examples

Finding Files in a Specific Project

To find files in a specific project, create a view in the web client. For example, if you’d like to see all files in a project, navigate to your project and then select the Tables tab. From there, click Tables Tools and Add File View. Click Add container and Enter Synapse ID to create a tabular file view that contains every file in the project, which you can now query. Importantly, if you want to later query on annotations, you must select Add All Annotations. For a more in-depth look at this feature, see the article on file views.

Listing Files in a Specific Folder

If you are using a programmatic client, you can list the files in a specific folder. First, you need to know the synID of the folder (for example syn1524884, which has data from TCGA related to melanoma). All entities in this folder will have a parentID of syn1524884.

The function to find all files in this folder is called “getChildren”:

Python

foo = list(syn.getChildren(parent='syn1524884', includeTypes=['file']))
PY

R

foo <- as.list(synGetChildren(parent='syn1524884', includeTypes=list('file')))
R

Queries on Annotations

If annotations have been added to files, they can be used to discover files of interest from a file view syn12345678. For example, you can identify all files annotated as bam files (fileFormat = bam) with the following query:

SELECT * FROM syn123456 WHERE "fileFormat"='bam'
CODE

Likewise, if you put the RNA-Seq related files described in the section above into the project syn00123 with the described annotations, then you could find all of the files for conditionB and sampleA:

SELECT * FROM syn123456 WHERE "projectId"='syn00123' AND "specimenID"='sampleA_conditionB'
CODE

Lastly, you can query on a subset of entities that have a specific annotation. You can limit the annotations you want displayed as following.

SELECT specimenID,genomeBuild,fileFormat,platform FROM file WHERE "projectId"='syn00123' AND "specimenID"='sampleA_conditionB'
CODE

Reproducible queries can be constructed using one of the analytical clients (command line, Python, and R) and on the web client, query results can be displayed in a table on a wiki page.

In a project, from the wiki page click Wiki Tools in the upper right corner to Edit Project Wiki. Click Insert and choose Table: Query on Files/Folders. Enter your query in the box and click the Insert button. Once you save the wiki page, the results will be displayed as a table.

synapse query "SELECT specimenID,genomeBuild,fileFormat,platform FROM syn123456 WHERE \"specimenID\"='sampleA_conditionB'"

result = syn.tableQuery("SELECT specimenID,genomeBuild,fileFormat,platform FROM syn123456 WHERE \"specimenID\"='sampleA_conditionB'")

result = synTableQuery("SELECT specimenID,genomeBuild,fileFormat,platform FROM syn123456 WHERE \"specimenID\"='sampleA_conditionB'")
CODE

Download from a Query

You can download files in a folder using queries. Currently this feature is only available in the command line client. For example, if you want to download all files in a file view that has a synapse ID of syn00123, use:

synapse get -q "SELECT * FROM file WHERE parentId = 'syn00123'"
CODE

Troubleshooting

Single quotes in Synapse queries must be replaced by double quotes or two single quotes. In order to query for the chemicalStructure of 4'-chemical:

SELECT * FROM syn123 where "chemicalStructure" = '4"-chemical'
# OR
SELECT * FROM syn123 where "chemicalStructure" = '4''-chemical'
CODE

Downloading DataTables


Need More Help? Ask a question in the Synapse Help Forum. Your feedback is key to improving our documentation, so contact us if something is unclear or open an issue.