HymenopteraMine v1.5 Documentation

HymenopteraMine is an integrative resource for genomic data on Hymenoptera, including honeybees, ants, wasps, etc. Powered by InterMine, it provides a user-friendly way to access genomic, proteomic, interaction and literature data. HymenopteraMine is a part of the Hymenoptera Genome Database.

This tutorial is aimed at giving users an introduction to the different parts of HymenopteraMine and how users can make the most of HymenopteraMine.

_images/header.png

Main site: http://hymenopteragenome.org/hymenopteramine

HGD YouTube Channel with HymenopteraMine Videos: https://www.youtube.com/channel/UC1NVFd9buEtlbA2mcdq0MXQ

List of available datasets in HymenopteraMine: http://hymenopteragenome.org/hymenopteramine/dataCategories.do

Overview of HymenopteraMine

Below is a brief summary of the layout of HymenopteraMine:

Home – The home page for HymenopteraMine.

Templates – List of templates that users may select from based on the nature of their query.

Lists – Allows users to upload lists of genes and perform enrichment analyses. Logged-in users may save their lists for future use.

QueryBuilder – Allows users to build custom queries by browsing the HymenopteraMine data model and customize their results. The queries may be exported to a number of formats including XML.

Regions – Genomic Region Search page where users may enter genomic coordinates and fetch features that fall within the interval. The interval may be extended to increase the range of search.

Data sources – Table of all data sources with their links, date of download, and related publication(s).

Help- Links to the HymenopteraMine tutorial.

API – Describes the InterMine API that allows users to programmatically access HymenopteraMine.

HGD Blast – Links to a Blast page where users may BLAST their sequence(s) of interest with the Hymenoptera species reference genome, CDS sequences, and protein sequences.

MyMine – Once users are logged in, MyMine serves as portal for accessing saved lists and saved templates. Users may also check their account details and manage their account using MyMine.

Searching in HymenopteraMine

There are several ways that users may query HymenopteraMine.

Templates

Another method of searching HymenopteraMine is through the use of templates (predefined queries). Popular templates are displayed on the home page, grouped by category (Genes, Protein, Homology, etc.) The full list of templates may be viewed by clicking the Templates menu tab.

Popular templates on the home page

Popular templates

Full list of templates

List of templates on the Templates page

As an example, the GO TermGene template queries HymenopteraMine for all genes annotated with a given GO term.

GO Term --> Gene template

Example: GO Term ⮕ Gene template

The results page shows all the genes having the Gene Ontology term “DNA Binding” in their annotation. When logged in, users may create a new list or add these genes to an existing list to perform further analyses. Click on the Save as List button above the table of results, then choose the column to add to the list. See the Lists section for more details on creating and saving lists.

GO Term --> Gene template results for identifier "GO:0003677"

Example: Results after searching on GO:0003677 (identifier for GO term “DNA binding”)

Generate query code

The code for each query may be obtained by clicking on the arrow next to Generate Python Code and choosing the desired language from the pull-down menu. The language options are Python, Perl, Java, Ruby, JavaScript, and XML.

Generate code pull-down menu

Generate code options

Download results

The search results may also be downloaded by clicking the Export button above the table and choosing the desired format from the pull-down menu to the right of the File name field (blue box in the figure below). Available formats are tab-separated values, comma-separated values, XML, and JSON. When the results contain genomic features, they may also be downloaded in FASTA, GFF3, or BED format. Other options may be specified in the submenu to the left of the download box (orange box in the figure below). By default, all rows and all columns are downloaded, but individual columns may be included or excluded by clicking on the toggles next to the column headers in the All Columns submenu. The number of rows and row offset are set in the All Rows submenu. Download the results as a compressed file by choosing GZIP or ZIP format in the Compression submenu (default is No Compression). Column headers are not added by default but may be included under the Column Headers submenu. Finally, the Preview submenu displays the first three rows of the file to be downloaded so that the desired format and options may be finalized before beginning the download. When ready, click the Download file button to download the results.

Options for results file download

Download results options

Customize output

Click the Manage Columns button to customize the results table layout. Edit or remove active filters by clicking the Manage Filters button. Click Manage Relationships to specify the entity relationships within the query.

Optional filters

Some templates have optional filters that are disabled by default. For example, the GO Term ⮕ Gene template has an additional filter for specifying an organism. To enable this filter, click ON below optional.

GO Term --> Gene template with organism filter enabled

Example: GO Term ⮕ Gene template with organism filter enabled

Note: The Query trail link at the top of the page does not work for templates with optional filters. To edit the template query, navigate back to the template page either by clicking on the template name at the top of the query results page or by selecting the template from the Templates tab.

QueryBuilder

While the templates provided are suitable for many different types of searches, new queries may be built from scratch using the QueryBuilder. The possibilities of queries using the QueryBuilder are endless. The output may be formatted exactly as desired, and the query constraints may be chosen to perform complex search operations.

QueryBuilder

To begin, select a Data Type. For example, select Gene as a Data Type and click the Select button.

Gene data type selected in QueryBuilder

Example: Gene data type selected in QueryBuilder

Model browser

After choosing a data type, the Model browser appears displaying the attributes for the selected feature class.

Model browser

Model browser with Gene selected as data type

Using the model browser, fields and constraints may be added to the query. Clicking Show to the right of an attribute will add that field to the query. Clicking Constrain brings up a window with filter options for the attribute selected. The Query Overview summarizes the current state of the query; it displays the currently selected fields and constraint logic. The results columns are displayed at the bottom of the page, where they may be rearranged or removed.

Examples

The following examples give a more in-depth demonstration on how to use the QueryBuilder. All examples use Gene as the selected data type.

Example 1: Querying for protein coding genes

In the Model browser, click Show next to DB Identifier and Symbol, which will add these fields to the query. Notice that these two fields appear below Gene in the Query Overview section and at the bottom under Fields selected for output.

Step 1: select fields to be added to the query

Step 1: Select fields to be added to the query

Then click Constrain next to Biotype. The first drop-down menu defaults to = (equals sign). In the second drop-down menu, select Protein Coding, then click the Add to query button. This adds a constraint to the query to search only for protein coding genes. Notice that the Query Overview section now shows “Biotype = Protein Coding”. Also, two types of icons appear next to the attributes. Clicking on the red “X” icon next to an attribute will remove that field or constraint from the query. Clicking on the blue pencil icon next to a constraint brings up the constraint editing window from earlier where changes may be made to the query filters.

Step 2: Add a constraint to the query on Biotype

Step 2: Add a constraint to the query on Biotype

Finally, click on the Show results button above the Model browser. The resulting table contains all protein coding genes in the database, with DB Identifier and Gene Symbol as the two table columns.

Step 3: Display query results

Step 3: Display query results

Example 2: Querying for protein coding genes on a particular chromosome

This example will extend the first example to add another constraint to the query.

After running the query in the above example, click on Query at the top of the page next to Trail to go back to the model browser and edit the query.

Click on the query trail to edit the query

Click on the query trail to edit the query

In the Model browser, click the + (plus sign) next to the Chromosome feature class to display its attributes.

Step 1: View attributes of the Chromosome feature class

Step 1: View attributes of the Chromosome feature class

Then click Constrain next to the attribute DB Identifier. In the pop-up window, enter NW_003791143.1 into the text field, and click Add to query. This adds an additional constraint to the query that searches for protein coding genes on a chromosome with the DB identifier NW_003791143.1.

Step 2: Add a constraint to the Chromosome DB Identifier

Step 2: Add a constraint to the Chromosome DB Identifier

Click the Show results button as before to view the results of the query. The columns are the same as in the first example, but notice that now there are only 625 rows in the table (compared to over 500,000 in the first query) due to the additional constraint.

Step 3: Display query results

Step 3: Display query results

Example 3: Querying for protein coding genes on a particular chromosome and their exons

This final example extends the above query to display all exons for each protein coding gene.

As above, click on Query at the top of the results page to go back to the model browser and edit the query.

In the Model browser, scroll down to locate the Exon feature class, and click the + (plus sign) next to Exons to display its attributes. Click Show next to DB Identifier and Length.

Step 1: Select Exon fields to be added to the query

Step 1: Select Exon fields to be added to the query

The Query Overview shows the query in progress, with four fields and two constraints. Also notice that a third type of icon, a blue square, appears next to a couple of the attributes. Clicking on a blue square icon next to an attribute brings up a window where the query Join Style may be modified. Click on the blue square icon next to Exon collection to bring up the Switch Join Style window. The default option is Show only Genes if they have a Exon. Change this to Show all Genes and show Exons if they are present and click Add to query.

Step 2: Change the join style

Step 2: Change the join style

Click Show results to run the query.

Step 3: Display query results

Step 3: Display query results

Notice that the results table contains the same rows as in the second example, but now there is a new column, Gene Exons. For example, looking at the second row, the Gene with DB Identifier 100862997 has 12 exons. Click on the 12 exons text to expand the table with additional rows containing the DB identifier and Length for each of the 12 exons.

Step 4: Expand information on exons

Step 4: Expand information on exons

By changing the join style, the exons have been grouped together by gene, making it easier to see how many exons each gene has. By contrast, if the query is run with the default option of “Show only Genes if they have a Exon”, the results table adds a new row for each exon.

Same query with default join style for exons

Same query with default join style for exons

Report Pages

Every object (e.g., Gene, Protein, Exon) in HymenopteraMine has a report page. The layout of the report page depends on the data available for the object. Report pages may be accessed by clicking on an object name in the results table after running a query.

As an example, on the home page of HymenopteraMine, click on the Protein tab in the Popular Templates section. (Refer to the Templates section for more details on using templates to search the database.) Click on the Gene Symbol –> Proteins template. In the text field, enter LCCH3, and select A. mellifera as the Organism. Then click Show Results.

Notice that each item in the results table is a hyperlink. Hover over an item to bring up a quick summary window for that item. For example, hover over LCCH3 to view a summary of the gene with this symbol. The summary contains the gene’s biotype, database identifier, description, length, organism, symbol, and source. Similarly, hover over Q0GQR5_APIME to view a summary of the protein with this DB Identifier.

Summary popup window for gene

Example: Summary window for LCCH3

Clicking on an item in the table rather than just hovering over it will bring up its report page. For example, click on LCCH3 in the Gene Symbol column to view its report.

Report page

Report page

The report page provides a complete description for this gene. The header displays the database identifier, followed by the information from the summary window for the gene (organism, symbol, source, etc.) Biotype indicates the type of gene; in this case the type is protein coding.

The contents of the report page are divided into categories based on the type of information provided.

Summary

A Summary section near the top of the report provides information on the gene such as its length, chromosome location, and strand information.

Summary section of report page

Alias and DBxref

The Alias and DBxref section displays a table of aliases and database cross references for the gene. In this example, the gene with DB Identifier 412740 has four aliases and two cross references. Click on the text 4 Alias Names and 2 Cross References to expand the table with additional rows containing the ID and Source for each alias and DB Identifier and Source for each cross reference.

Aliases and database cross references section of report page

Transcript

The Transcript section contains information about the gene model, such as transcripts, exons, etc. It includes a diagram visually representing each transcript with its features highlighted (if applicable). In the case of protein coding genes, a table with protein information is also provided.

Transcript section of report page

Proteins

The Proteins section provides information about the protein product of gene. The comments section gives a brief description about the protein along with the UniProt accession.

Proteins section of report page

Function

The Function section displays Gene Ontology annotations for a gene. Annotations are divided into three categories:

  • Cellular Component
  • Molecular function
  • Biological process

The GO terms are displayed along with the evidence code indicating how the annotations were derived. If applicable, a table of information on Pathways is also shown.

Function section of report page

Homology

The Homology section includes information on homologues for the gene.

Homology section of report page

Publications

The Publications section displays a table of publications related to the gene.

Publications section of report page

Other

This last section provides miscellaneous information that doesn’t fit into any of the above categories, e.g., data sets including a gene, protein domain regions for a protein, etc.

Lists

Creating Lists

Users may create and save lists of features, such as gene IDs, transcript IDs, gene symbols, etc. The list tool searches the database for the list items and attempts to convert each identifier to the selected type. Click on the Lists tab from the menu to access the full list upload form. A short version of the form is also in the Quick List box on the home page.

List upload form

List upload form

As an example, enter the following identifiers (comma-separated):

GB41586, Sec61Beta, TRAM, Mocs1, mal

Leave the Select Type as “Gene” and Organism drop-down as “Any”. Then click Create List. A Summary table is displayed with the results of searching for each of the five identifiers in the list.

Example: Search results for list of five identifiers

Example: Search results for list of five identifiers

Next, click Save a list of 5 Genes. A List Analysis page is presented that contains widgets allowing users to perform analyses on the genes in the list.

Example: List analysis for gene list

Example: List analysis for gene list

The available widgets are:

  1. Chromosome Distribution
  2. Gene Ontology Enrichment
  3. Protein Domain Enrichment
  4. Publication Enrichment
  5. Pathway Enrichment
  6. Orthologues

The selection of widgets provided on the List Analysis page depend on the contents of the list. In the above example, three widgets appear: Gene Ontology Enrichment, Publication Enrichment, and Pathway Enrichment.

Read the Important Notes for Enrichment Widgets for special instructions to avoid false positives.

Example: Available widgets

Example: Available widgets

Saving Lists

Saved lists appear under the View tab on the Lists page. For users who are not logged in, lists are saved temporarily; users must log in to save lists permanently. Saved lists may also be accessed from the MyMine menu item.

Predefined lists of all genes from different species are also available on the Lists page for all users.

Saved lists. Lists belonging to user are highlighted.

Saved lists. Lists belonging to user are highlighted.

MyMine

MyMine serves as a portal where logged-in users may manage their lists, queries, templates, and account details.

To access MyMine, click on the MyMine menu tab. A submenu appears with six options:

Lists - Saved lists.

History - List of queries recently run.

Queries - List of saved queries.

Templates - Templates created or marked as “Favorite”.

Password - Password reset form.

Account Details - User preferences form.

MyMine

MyMine

API

An API is available for users who would like to programmatically access HymenopteraMine.

API page

Perl, Python, Ruby, and Java are the languages supported by the InterMine API.

For more detailed information, view the InterMine documentation.

Data Sources

The Data Sources table provides a description of the datasets that are integrated into HymenopteraMine, along with their date of download, version or release, citations wherever applicable, and any additional comments.

Data sources table

Data sources table

BLAST

Users may perform BLAST queries against the Hymenoptera genomic, CDS, or protein sequences using the BLAST page.

BLAST search

BLAST search

How to Cite

Please cite the use of HymenopteraMine as follows:

Elsik CG, Tayal A, Diesh CM, Unni DR, Emery ML, Nguyen HN, Hagen DE. Hymenoptera Genome Database: integrating genome annotations in HymenopteraMine. Nucleic Acids Research 2016 Jan 4;44(D1):D793-800. doi: 10.1093/nar/gkv1208. Epub 2015 Nov 17. PubMed PMID: 26578564.

Elsik, C., Tayal, A., Diesh, C., Unni, D., Emery, M., Nguyen, H., & Hagen, D. (2016, Jan 4). HymenopteraMine. Retrieved [Date], from http://hymenopteragenome.org/hymenopteramine/begin.do.

Please cite the use of a specific page in HymenopteraMine as follows:

Elsik, C., Tayal, A., Diesh, C., Unni, D., Emery, M., & Hagen, D. (2016, Jan 4). [Title of page]. Retrieved [Date], from [link of page].

Please cite the use of any other tools and data from the HGD website as follows:

Elsik CG, Tayal A, Diesh CM, Unni DR, Emery ML, Nguyen HN, Hagen DE. Hymenoptera Genome Database: integrating genome annotations in HymenopteraMine. Nucleic Acids Research 2016 Jan 4;44(D1):D793-800. doi: 10.1093/nar/gkv1208. Epub 2015 Nov 17. PubMed PMID: 26578564.

In addition, please cite the use of genome consortium data with the appropriate genome consortium publication.