Gene Structure:  Family of genes:

 

Gene is the unit of heredity (DNA or RNA) with a structure and function. Genes are duplicated, slightly modified in time and inherited, over a time course of evolution, developed into gene family or a family of genes.  A gene may exist in multiple copies with certain subtle variations, some may be active some may be inactive as pseudo genes or cryptic genes.  A gene family is a group of genes that share important characteristics. In many cases, genes in a family share a similar sequence of DNA building blocks (nucleotides). These genes provide instructions for making products (such as proteins) that have a similar structure or function. In other cases, dissimilar genes are grouped together in a family because proteins produced from these genes work together as a unit or participate in the same process.

 

Genes belonging to multiple alleles fall into a family of genes. They may be clustered in a region or dispersed in the genome.  The members of this gene family may express at different stages of development and some may remain silent.  Often genes that are involved in a particular structure or function can be deemed as a family of genes, ex. DNA pol and RNA pol involved in DNA replication and transcription.  There are Gene families-allelic family, functional families and Protein family of genes.

 

Basic structural features of Genes:

 

Basically, it consists of a segment of genetic material-DNA (in most) or RNA (some viruses).  It is associated with specific proteins; this forms into unit of heredity; it replicates, performs functions by producing specific RNAs and proteins. It is a unit, means a length of the genetic material, which replicates and performs a specific function by producing a specific protein, It undergoes mutations in time and creates variations. The number of genes per organism varies from organism to organism.

 

Related image

http://book.bionumbers.org/

 

A comparison between the number of genes in an organism and a naïve estimate based on the genome size divided by a constant factor of 1000bp/gene, i.e. predicted number of genes = genome size/1000. One finds that this crude rule of thumb works surprisingly well for many bacteria and archaea but fails miserably for multicellular organisms. http://book.bionumbers.org/.

How Many Protein-Coding Genes Are in That Genome?

 

Interestingly, the same "remarkable lack of correspondence" can be noted when discussing the relationship between the number of protein-coding genes and organism complexity. Scientists estimate that the human genome, for example, has about 20,000 to 25,000 protein-coding genes. Before completion of the draft sequence of the Human Genome Project in 2001, scientists made bets as to how many genes were in the human genome. Most predictions were between about 30,000 and 100,000. Nobody expected a figure as low as 20,000, especially when compared to the number of protein-coding genes in an organism like Trichomonas vaginalisT. vaginalis is a single-celled parasitic organism responsible for an estimated 180 million urogenital tract infections in humans every year. This tiny organism features the largest number of protein-coding genes of any eukaryotic genome sequenced to date: approximately 60,000.

 

In fact, compared to almost any other organism, humans' 25,000 protein-coding genes do not seem like many. The fruit fly Drosophila melanogaster, for example, has an estimated 13,000 protein-coding genes. Or consider the mustard plant Arabidopsis thaliana, the "fruit fly" of the plant world, which scientists use as a model organism for studying plant genetics. A. thaliana has just about the same number of protein-coding genes as humans—actually, it has slightly more, coming in at about 25,500. Moreover, A. thaliana has one of the smallest genomes in the plant world! It would seem obvious that humans would have more protein-coding genes than plants, but that is not the case. These observations suggest that there is more to the genome than protein-coding genes alone.

 

As shown in Table 1 (adapted from Van Straalen & Roelofs, 2006), there is no clear correspondence between genome size and number of protein-coding genes—another indication that the number of genes in a eukaryotic genome reveals little about organismal complexity. The number of protein-coding genes usually caps off at around 25,000 or so, even as genome size increases.

 

 

Table 1: Genome Size and Number of Protein-Coding Genes for a Select Handful of Species

Species and Common Name

Estimated Total Size of Genome (bp)*

Estimated Number of Protein-Encoding Genes*

Saccharomyces cerevisiae (unicellular budding yeast)

12 million

6,000

Trichomonas vaginalis

160 million

60,000

Plasmodium falciparum (unicellular malaria parasite)

23 million

5,000

Caenorhabditis elegans (nematode)

95.5 million

18,000

Drosophila melanogaster (fruit fly)

170 million

14,000

Arabidopsis thaliana (mustard; thale cress)

125 million

25,000

Oryza sativa (rice)

470 million

51,000

Gallus gallus (chicken)

1 billion

20,000-23,000

Canis familiaris (domestic dog)

2.4 billion

19,000

Mus musculus (laboratory mouse)

2.5 billion

30,000

Homo sapiens (human)

2.9 billion

20,000-25,000

* There may be other estimates in the literature, but most estimates approximate those listed here.

 

While the majority of emphasis has been placed on protein-coding genes in particular, scientists have continued to refine their definition of what exactly a gene is, partly in response to the realization that DNA encodes more than just proteins. For instance, in a study of the mouse genome, scientists found that more than 60% of this 2.5 billion bp genome is transcribed, but less than 2% is actually translated into functional protein products (FANTOM Consortium et al., 2005). Within this article, however, the discussion focuses on protein-coding genes, unless otherwise stated. https://www.nature.com/

 

 

Main components of Human genome:

Figure 2: The different sequence components making up the human genome. About 1.5% of the genome consists of the ≈20,000 protein-coding sequences which are interspersed by the non coding introns, making up about 26%. Transposable elements are the largest fraction (40-50%) including for example long interspersed nuclear elements (LINEs), and short interspersed nuclear elements (SINEs). Most transposable elements are genomic remnants, which are currently defunct. (<a target='_blank' href='http://bionumbers.hms.harvard.edu/bionumber.aspx?&id=110283'><a target='_blank' href='http://bionumbers.hms.harvard.edu/bionumber.aspx?&id=110283'>BNID 110283</a></a>, Adapted from T. R. Gregory Nat Rev Genet. 9:699-708, 2005  based on International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature 409:860 2001.)

The different sequence components making up the human genome. About 1.5% of the genome consists of the ≈20,000 protein-coding sequences which are interspersed by the non-coding introns, making up about 26%. Transposable elements are the largest fraction (40-50%) including for example long interspersed nuclear elements (LINEs), and short interspersed nuclear elements (SINEs). Most transposable elements are genomic remnants, which are currently defunct. (BNID 110283, Adapted from T. R. Gregory Nat Rev Genet. 9:699-708, 2005 based on International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature 409:860 2001.)  http:// book.bionumbera.org,https://www.slideshare.net

During the course of evolution, spanning several billions of years, the genes might have gone through duplication and certain degree of changes in their sequences, but not drastically.  Some of those, which have lost functions, are called pseudo genes.  Some of the related genes form a kind of super family.

 

Image result

These diagrams represent just basic components of eukaryotic promoters, such as specific gene promoters and housekeeping gene promoters, transcribed by RNA pol II

http://slideplayer.com

 

 

Globin Genes:

 

Globin genes exist as clusters of alpha Globin and beta Globin family of genes. They are expressed at different stages of the development of the organism.

 

 

                        Alpha Globin:

 

They are organized in a cluster of 30kbp; they are organized in the       following sequence.

 

----chi-I-psi-alpha--chi—psi-alpha-1---alpha-2—alpha-1-alpha-1—

 

 

 

 

Alpha clusters:

 

Gene

Type

Expression

Chi

 

In embryo

Psi-alpha-

Pseudo gene

 

chi

 

 

 

Psi-alpha-1

Pseudo gene

 

Alpha-1

 

Fetus and adult

Alpha-2

 

Fetus and adult

Alpha-1

 

Fetus and adult

 

 

 

 

 

 

 

 

Beta clusters:

Beta clusters: organized in the following sequence. They are in a region of 50kbp locus, between the genes there are spaces, which are not transcribed, it holds good for both clusters. The table shows the expression of genes with time.

 

 

Gene

Type

Expressed in

Epsilon

-

Embryonic stage

G-gamma

-

Fetus stage

A-gamma

-

Fetus stage

Psi-beta0-1

Pseudo gene

 

Delta

-

In adults

Beta

-

In adults

 

 

 

 

 

 

 

 

 

 

This line diagram depicts the organization alpha and beta globin gene families

 

Early Embryonic means: less than eight weeks.

Embryonic: means less than eight weeks of pregnancy, expressed genes are - Chi2, chi-2, gamma-2, alpha-2, and epsilon-2

Fetal means: 3-9 months, but expressed are alpha-2, gamma-2.

Adult means: from birth, but the expressed genes are alpha-2 delta-2, alpha-2, and beta2.

Chi and alpha are like alpha.

Epsilon, gamma, delta and beta are like beta.

 

Histone gene family:

 

Histone genes exist not as single genes but as a family of genes and they are clustered together in certain loci of Human chromosomes.

 

 

Clustering of Heat shock (Hs) and Globin genes on chromosome 11; http://www.web-books.com/

 

Sea urchin:

The repeat length is 6300 bp and the number of repeats is 300 in sea urchins and 600 in newts.

 

http://genome.cbs.dtu.dk/ 

Drosophila:

4800bp long and there are 100 repeats.

 

This diagram shows how histone genes as clusters organized as loci in chromosomes. http://genome.cbs.dtu.dk/

 

Humans:  Human genome contains a family of Histone genes.

 

I--H1-->I--I--H3-->I---I<--H2b---I--I-->H2A-->I---I--H4->->

 

Yeasts: They contain two copies each of histone genes.  Birds contain 10 to 20 repeats of each cluster.  In mammals the number of repeats is 600 to 800.

 

Description: Figure 7.1. Organization of the human genome.

Classification of human genome into structural and functional forms

rRNA gene family; http://www.rochester.edu/

 

 

 

 

Some of the gene families

Number of genes

 

Actin

5-30

 

3-10

Keratin

>20

Myosin (heavy chain)

5-10

Protein kinase?

10-100

Human Ig variable

>500

Chick Ovalbumin

3

Tubulin alpha and beta

3-15

Visual pigment gene (human)

4

Chick vitellogenin

5

Insect egg shell protein

-

Silk fibroin (silk moth and fruit fly)

50

Transplantation human antigen

50-100

Globin genes

 

 

 

Skin color,

 

Hairs color

 

Height,

 

MHC class

MHC I, II and III -200 genes

Ig Family

The Ig like domains can be classified as IgV, IgC1, IgC2, or IgI

Homeobox,

 

ABCA

 

DNA Pol

Prokaryotic, Archaea, eukaryotic

RNA PoL

Prokaryotic, Eukaryotic-RNAP I, II and III

HSPs

HSP 60, 70 and 90, HSP 90  has 17 or are more in human genome

Ribosome Protein family

60-70

tRNA family

22-30

rRNA family

PK and EK

snRNA family

5-8

Sigma factor family

 

T and B cell receptors

 

T cell receptors

 

Cytokine /Lymphokines

 

There are many

protein families

too

 

Signal receptor s

 

Motor

 

Membrane transporters

 

 

Protein kinases and

other kinases

 

Phosphatase

 

 

Structural proteins

 

Metallothionein

 

Histone

 

 

 

 

 

 

 

 

 

 

Distribution of Some Gene Families on the Chromosomes

Gene family

Gene count

Chromosomes

Additional information

Calmodulin

3

2, 14, 19

identical protein sequences, many other related proteins

Enolase

3

1, 12, 17

Actinins

4

1, 11, 14, 19

Notch

4

1, 6, 9, 19

also smaller related protein on chromosome 1

Amylase

5

1

cluster spans about 205 kb, also pseudogene

G β subunits

5

1, 7, 9, 12, 15

Actin

6

1, 2, 7, 10, 15, 17

also highly similar ACTBL2 and many other related proteins

Polycomb PCGF

6

2, 4, 10 (3), 17

three genes on chromosome 10 not closely linked

Alcohol dehydrogenase

7

4

cluster spans about 365 kb

Metallothionein

11

16

cluster spans about 120 kb, also related genes / pseudogenes

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

A gene family is a group of genes that share important characteristics. In many cases, genes in a family share a similar sequence of DNA building blocks (nucleotides). These genes provide instructions for making products (such as proteins) that have a similar structure or function.

 

 

In other cases, dissimilar genes are grouped together in a family because proteins produced from these genes work together as a unit or participate in the same process. Some of the genes found in duplicates and located on chromosomes, not on the same but can be on different chromosomes.

 

 

The following families, defined by the HUGO Gene Nomenclature CommitteeThis link leads to a site outside Genetics Home Reference. are included in Genetics Home Reference.