Ensembl BioMart shows results for protein-coding genes when protein-associated attributes are chosen.  Non-coding genes that pass filters will not be shown in the results if certain protein-associated attributes are chosen. Why does this occur?

In the Ensembl BioMart, the main dataset is made up of three main tables, each with a number of associated dimension (dm) tables.

The first main table is built from the gene table in the Ensembl core schema. With it goes all the information directly associated with the gene table, such as cross-references assigned directly to genes (as opposed to transcripts or translations).

The second main table inherits its fields from the gene main table and adds the transcript-related data, cross-references specifically on transcripts, etc.  When building the transcript main table, it also inherits the data from the gene main table. More specifically, the transcript main table contains the data from the gene table for all genes that have transcripts, i.e. all genes.

Depending on what filters and attributes you select, the gene or the transcript main table will be used. Selecting HGNC symbols and transcript stable IDs, for example, will use the transcript main table as the transcript stable IDs are not available in the gene main table.

The third main table inherits its structure from the second main table, which means that it contains all the fields from the gene main table and the transcript main table, and then it adds the specific fields for the translations (e.g. cross-references specifically on translations). It contains all the data from the transcript main table, but only for the transcripts that have translations, i.e. *not* all transcripts.

When the SwissProt-KB ID attribute (or any other external reference mapped to translations, e.g. EMBL ID or HPA ID in human) is applied, the main table involved is the translation main table since the attribute is a cross-reference associated with translations.  The translation main table only contains data for the transcripts (and genes) that have translations. A consequence of this is that non-coding genes that pass filters will not be shown in the results if certain protein-associated attributes are chosen.  We believe that it may be possible to change this behaviour and have requested such a change from the BioMart developers.


If you have any other questions about Ensembl, please do not hesitate to contact our HelpDesk. You may also like to subscribe to the developers' mailing list.