write a note on human genome project? (5 MARKS QUESTION)
kindly answer as we have to write it in exams which could fetch us maximum marks. i.e. in points or paragraph?
The human genome project was the joint venture of US department of energy and National Institute of Health (NIH), later joined by Welcome Trust (UK). It was Launched in 1990, completed in 2003. This project worked towards the determination of complete DNA sequence of humans. DNA is the storehouse of genetic information and determining its sequence of base pairs can solve many medical, agricultural, environmental, and evolutionary mysteries.
Goals of HGP:
Some important goals of human genome project were
- Identify all the approximately 20,000-25,000 genes in humans.
- Determine the sequence of 3 billion chemical base pairs constituting human genome.
- Store this information in database.
- Improve tools of data analysis
- Address ethical, legal and social issues that may arise from the project.
Relationship of HGP with Bioinformatics
- Human genome (genome refers to the totality of genes that are present in a human being) contains 3 × 109 base pairs.
- Cost of sequencing 1 bp = US $ 3
Cost of sequencing 3 × 109 bp = US $ 9 billion
- Enormous sequence data so generated would have required 3300 books containing 1000 pages each just for a human genome.
- Hence, for storing, retrieving, and analysing this enormous data, a new branch of biology has been developed known as bioinformatics.
- Genomes of many non-human models such as bacteria, yeast, Caenorhabditis elegans, Drosophila, plants (rice and Arabidopsis) have also been sequenced.
Methods to Identify Genes
- Two methods − identifying ESTs (Expressed sequence Tags) and sequence annotation
- ESTs − As the name suggests, this refers to the part of DNA that is expressed, i.e. transcribed, as mRNA and translated into proteins thereafter. It basically focuses on sequencing the part denoting a gene.
- Annotation − In this approach, entire genome (coding + non-coding) is sequenced and later on function is assigned to each region in the genome.
- DNA from the cells is isolated and is randomly broken into fragments of smaller sizes.
- These fragments are cloned into suitable host using vectors.
- Cloned fragments amplify in the host. Amplification facilitates an easy sequencing.
- Common vectors used − BAC (Bacterial artificial chromosomes) and YAC (Yeast artificial chromosomes)
- Common hosts − Bacteria and yeasts
- Automated sequencers are used to sequence these smaller fragments (Sanger sequencing).
- The sequences so obtained are arranged based on overlapping regions within them (alignment).
- Alignment of the sequences is also done automatically by computer programs.
- Then these sequences are annotated and assigned to each chromosome.
Preparation of Genetic and physical maps on Genome
- 2 methods are used − restriction polymorphism and microsatellites
- Restriction polymorphism − Specialized enzymes called restriction endonucleases are used to cut the genome at specialized sites called restriction endonuclease recognition site and maps are prepared based on it.
- Microsatellites − These are repetitive DNA sequences.
Observations from HGP
- Human genome contains 3 × 109 (3164.7 million) nucleotide bases.
- An average gene consists of 3000 bases. However, the size of genes varies. Largest gene is dystrophin (2.4 m bases).
- Total number of genes in human genome − 30,000
- Over 50% of the discovered genes have unknown functions.
- Less than 2% of genome is coding.
- Large portion of genome consists of repeating sequences.
- Repetitive sequences have no coding function. They are repeated over hundred to thousand times. They may have a role in evolution, chromosome structure, and dynamics.
- Chromosome with most genes − Chromosome 1 (2968)
Chromosome with fewest genes − Chromosomes Y (231)
- SNPs (single nucleotide polymorphism) occur at about 1.4 million locations in human DNA. They are believed to have significance in explaining diseases and evolutionary history of human beings.