- PyMol for visulaizing prortein strcutures.
Lecture 7 Phylogenetic analysis (2/24)
A.) Protein domains and Protein families:
- Pfam at InterPro and the entry for IL6 ,
- Prosite entry for IL6 ,
Search the Blast hit and the motif search hit against
motifs database in Prosite
B.) Phylogenetic analysis
- Obtain sequences in fasta format. example
- Load sequence into ClustalOmega and perform global alignment; then Draw N-J tree.
- Example of the output ; Click the "phylogenetic Tree" tab to view the tree and downlaod the tree file to your GMS6014/phylogenetics/ folder
- Observe and manipulate the tree with Phylodendron
- example of trees 1 and 2 ;
- Standalone ClutalOmega for downlaod
- Comprehensive protein analysis tool JalView
Lecture 6 - Protein Domains and Motifs (2/22)
A. More on scoring matrices
- The orginal Henikoff and Henikoff paper that served as the fundation of using BLOSUM62 as the default for BLAST search
- Comparison of PAM vs BLOSUM matrices
B.) Limitations of generic scoring matrices
- Search for binary patterns
Extra: Search for Binary pattern with Bagua in dfile.
- Identifying motifs shared by a group of protein - load the 8IL6.txt to
MEME .
Leave your email address for accessing the results.
--> Shared motifs identified by MEME.
- an example of scoring matrix generated by MEME
- The Motifs identified by MEME could then be used for searching database of sequences
Blast output
vs. motif search using Mast or Using BG_0.5
C.) Protein Profile (protein family) Databases:
- Pfam at InterPro and the entry for IL6 ,
- Prosie entry for IL6 ,
Search the Blast hit and the motif search hit against
Pfam or Prosite
Lecture 5 (2/20) Alignment and sequence similarity
1.) How did BLAST identify the hit(s) for us?
- Blast results from standalone blast - IL6_Dm6.44Genes and IL6_Dm6.32.cDNAs .
- What is Blast ? - Basic Local Alignment Search Tool ( NCBI site ; Nature Education page ;)
The basis of quantifying sequence similarity
- - Block Substituion Matrices (BloSuM)
Blosum 62
matrix, Blosum matrices.
- What is block?
"Many known proteins can be grouped into families according to functional and sequence similarities. The similarity of the proteins across the sequences in each family is far from uniform. While some regions are clearly conserved, others display little sequence similarity. Often the conserved regions are crucial to the protein's function, for example enzymatic catalytic sites. Such conserved regions can be used to probe an uncharacterized sequence to indicate its function. " -- Pietrokovski, Henikoff, Henikoff 1996 Link
2.) How scoring matrix and penalty affect the outcome of local alignment
Using the
two test sequences , perform local alignment with the following parameters.
Local Alignment Web Service EMBOSS_Water or
LALIGN .
- Different matrices- try local aligment with either Blosum62 or
Blosum 35 - observe the difference.
- Different gap penalties- with matrix set to blosum62, try:
1.) alpha
(gap opening penalty)=15, beta (extension penalty)=3; and
2.) alpha=5,
beta=1.
Observe the results.
Lecture 4 (2/17) Standalone Applications
1.) Download and isntall the standalone NCBI-Blast: manual at NCBI
- Before installatin, read instruction for Windows,
Mac.
- Download the .win64.exe file for Windows or the .dmg for Mac from the NCBI ftp server . Change the default installation path to YourHomeFolder/GMS6014/
- Open a Command (Windows) or Terminal (Mac), navigate to the blast folder, list subfolders, then make new subfolder "dbs", "query", "out".
- Download Data set for BLAST search. Genomic dataset can be downloaded at Ensemble. A previously downloaded dataset - All Genes in D. mel genome in FASTA format
- save the data set in blast/dbs.
- Download 3 IL6 proteins sequences from UniProt and save as "3IL6.fasta" in the blast/query/ folder .
2.) Runnign blast
- Open a Command (Windows) or Terminal (Mac), navigate to the blast folder, list the directory, then make new subfolders "dbs", "query", "out".
- Download Data set for BLAST search. Genomic dataset can be downloaded at Ensemble. A previously downloaded dataset - All Genes in D. mel genome in FASTA format. Name this file as Dm6.44.AllGenes and save it in blast/dbs.
- Search and download 3 IL6 proteins sequences in FASTA format from UniProt and save as "3IL6.txt" in the blast/query/ folder . example
- Formate the dataset for search by running "makeblastdb -in dbs/Dm -dbtype nucl". Check that by observing the new files generated in dbs/
- Run tblastn to search for orthologs of IL6 in Dm6.44.Allgenes
Lecture 3 (2/15) HiPerGator
1.) Presentation slides for class.
2.) Tutorials:
- HiperGator tutorial and introduction ;
- Linux Command line tutorial.
Lecture 2 (2/13)
A.) List of public resources. ·
B.) Navigate the web of information on your favorite gene (or IL6)
- Search for the gene in Gene vs. Protein database
- Observe the difference between all text search and advacned search
- Pay attention to the multitude of links associated the Gene entry.
- Compare the human "IL6" entry in the NCBI Gene database v.s. the EBI UniProt database.
Pathway and interaction information on human IL-6 at
Reactome ;
C.) Web-based tools
- picking QPCR primer for your gene at Primer 3
D.) Linux Environment
- Log into HiperGator using your gatorLink credential and DHO - navigate to course folder
- Make a directory with your first name in the /blue/gms6014/share/ folder. This folder will allow me to view your progress and help touble-shooting.
- make a soft link from you home directory "~" to your course folder "ln -s /blue/gms6014/share/yourfirstname/ gms6014 "
- make a shell file and run it.
Lecture 1 (2/10)
A.) Retrieve and save sequence file:
- Try different view (format) of the same entry by selecting
different "display settings".
- Download the FASTA files of nucleotide and protein sequence into your local
computer*.
*: make a
folder such as "GMS6014" in your home fodler. Save all
course-related files in this folder. Avoid space in folder and file names
- Open the saved file using a text editor such as Notepad in Wondows and TextEdit in MacOS.
* consider using a dedicated text editor for bioinformatics projects. Such as NotePad++ (for Windos only) or
Emacs (all major OS)
B.) Local storage of sequence files:
- Use .doc or .rtf files for formatting and annotation of
sequences. example (Right click the
link and save in the "C:\temp\GMS6014" folder)
- Use .seq or .fasta to store raw sequences in fasta format for downstream
analysis. example (Right click the
link and save in the "C:\temp\GMS6014" folder)
- Change folder option to view file extension. Change association to always open .seq or .txt files in your designated text editor
- Try to load the MySequence.txt to Webcutter or NEBCutter for identifying
restriction sites; then try to load the MySequence.doc file.
C. Familiar yourself with the HiPerGator System if you have not use it