Old Links Used in Class - Spring 2025

Lecture 11 RNA-Seq / GO analysis (3/12)

A. Identify differentially expressed genes (DEGs)

Observe the output of cufflinks.

Compile the assembly files for Whole Gut

Identify DEGs with the CuffDiff job file .

A pipeline that streamline the whole process. Example

B. Functional interpretation of HTS (RNA-Seq) data

Our CUffDiff analysis of gene expression in old vs young whole gut produced many output.

Download the "/share/test/rnaseq/Wholegut_aging" folder with UF OOD
Open the gene_exp.diff file in excel, use sort to extract list of DEGs based on p/q value or the significance call

Search for a biological process of interest to you in Gene Ontology

Enrichment analysis . Try the g:profiler analysis with the list of significantly (p less than 0.0001) increased genes we generated with Cuff_Diff.

Lecture 10 RNA-Seq (3/10)

A. RNA_seq data analysis overview.

Review paper 1 and 2 .

Examples of protocols: 1.) Tophat-->Cofflink--Cuffdiff ; 2.) hisat2-->StringTie. You could also make your own based on the needs of your project with available components

B. Obtaining counts for genes

Perform transcript counting with cufflinks job file

Lecture 9 High Throughput Sequencing (HTS) data analysis (3/7)

A.) Mapping to reference genome

Observe - Star Aligner explained.

Check to see if the file size are right, run FASTQ QC if necessary.

Make genome index file using genome sequence and gtf files and script file

Map to the reference genome using the script/job file StarMap . Edit the script/job file with your text editor and load to your RNA-Seq directory.

submit the job file

If you just got your HiPerGator account with the class, simply do "sbatch SCRIPT_FILE"
If you had your primary account with your group, you have the option of using your group account, which is the default, or the class account by specifying "sbatch --account=gms6014 --qos=gms6014".

Lecture 8 Protein struture (3/05)

A.) Predicting protein structure

Alphafold2 paper by Deepmind .

Introduction to the Alphafold project on the Deepming website.

PDB - Protein Data Bank ( wwPDB ), played a fundamental role in improving protein struture analysis.

PyMol for viewing and annotating 3D structure

B.) Running AF2 on Hipergator

Obtain a fasta format file of the protein that you are interested to predict structure. Save and load it to your folder in /gms6014/share/YourName/AF2. (Prtoein file we use before.)
Edit the Script file on your local computer or using nano on HiPerGator.

change email address on the #SBATCH --mail-user line.
change the output folder name.
change the name of the fasta formate file.

Open terminal connect to HiPerGator. Navigate to your own AF2 folder, run the command "sbatch --account=gms6014 --qos=gms6014 AlphaFold2.sh".

Lecture 7 Phylogenetic analysis and HTS data (3/3)

A.) Phylogenetic analysis

Practice:

Obtain sequences in fasta format. example
Load sequence into ClustalOmega and perform global alignment; then Draw N-J tree.
For more options, install the standalone Clustalo
Example of the output ; Click the "phylogenetic Tree" tab to view the tree and downlaod the tree file to your GMS6014/phylogenetics/ folder
Observe and manipulate the tree with the Simple Phylogeny link or using the Phylodendron .

example of trees 1 and 2 ;
Standalone ClutalOmega for downlaod
Comprehensive protein analysis tool JalView. Download and install it to try out.

B.) Obtaining HTS dataset from GEO

Example - GSE62580 ; links for SRA dataset - SRP049144 - use the run selector to get accession list

using fastq-dump, you may choose to either download 4 samples of whole gut samples into a /wg folder within your working directory

for batch download, using a .sbatch job file - download the example ; change it to your email address. upload it to your folder and submit the job with "sbatch --account=gms6014 --qos=gms6014 [Filename]

Lecture 6 - Protein Domains and Motifs (2/28)

A - Search for binary patterns

Make a /pattern project folder in your /GMS6014 folder

Download both the Bagua program and datafile dfile to the same folder dfile . ** only works in PC **

Search for Binary pattern such as "CXXC" or "[EDQN]X[^RKH]D[AST]"

B - Identifying motifs shared by a group of protein

Load the 8IL6.txt to MEME . Leave your email address for accessing the results.

Shared motifs identified by MEME.

an example of scoring matrix generated by MEME

The Motifs identified by MEME could then be used for searching database of sequences

Blast output vs. motif search using Mast or Using BGHMM

** MEME, MAST, etc. can be run as a standalone program following installation . More options will be available to tailor the analysis. **

C - Protein profile (protein family) databases:

Pfam at InterPro and the entry for IL6 ,

Prosite entry for IL6 ,

Search the Blast hit and the motif search hit against Pfam or Prosite

Lecture 5 (2/26) Alignment and sequence similarity

1.) How did BLAST identify the hit(s) for us?

Blast results from standalone blast - IL6_Dm6.44Genes and IL6_Dm6.32.cDNAs .

What is Blast ? - Basic Local Alignment Search Tool ( NCBI site ; Nature Education page ;)

The basis of quantifying sequence similarity

- Block Substituion Matrices (BloSuM)

Blosum 62 matrix, Blosum matrices.

What is block?

"Many known proteins can be grouped into families according to functional and sequence similarities. The similarity of the proteins across the sequences in each family is far from uniform. While some regions are clearly conserved, others display little sequence similarity. Often the conserved regions are crucial to the protein's function, for example enzymatic catalytic sites. Such conserved regions can be used to probe an uncharacterized sequence to indicate its function. " -- Pietrokovski, Henikoff, Henikoff 1996 Link

2.) How scoring matrix and penalty affect the outcome of local alignment

Using the two test sequences , perform local alignment with the following parameters.

Local Alignment Web Service EMBOSS_Water or (backup LALIGN ).

Different matrices- try local aligment with either Blosum62 or Blosum 35 - observe the difference.

Different gap penalties- with matrix set to blosum62, try:

1.) alpha (gap opening penalty)=15, beta (extension penalty)=3; and

2.) alpha=5, beta=1.

Observe the results.

Lecture 4 (2/24) Standalone Applications (2/24)

1.) Download and isntall the standalone NCBI-Blast: manual at NCBI

Before installatin, read instruction for Windows, Mac.

Download the .win64.exe file for Windows or the .dmg for Mac from the NCBI ftp server . Pay attention to where the program is to be istalled. You may follow the default installation path but make a note of it as you may need it later.

Open a Powershell (Windows) or Terminal (Mac), type "makeblastdb" to verify the installation.

2.) Runnign blast

With File Exporer (window) or Finder (Mac), navigate to your GMS6014 folder and make a new /blast folder
Open a Powershell (Windows) or Terminal (Mac) in the blast folder, list the directory, then make new subfolders "dbs", "query", "out".
Download Data set for BLAST search. Genomic dataset can be downloaded at Ensemble. A previously downloaded dataset - All Genes in D. mel genome in FASTA format. Name this file as Dm6.44.AllGenes and save it in blast/dbs.
Search and download 3 IL6 proteins sequences in FASTA format from UniProt and save as "3IL6.txt" in the blast/query/ folder . example
Formate the dataset for search by running "makeblastdb -in dbs/Dm -dbtype nucl". Check that by observing the new files generated in dbs/
Run tblastn to search for orthologs of IL6 in Dm6.44.Allgenes

Lecture 3 (2/21)

1.) Presentation recording for the class.

2.) Tutorials:

HiperGator tutorial and introduction ;

Linux Command line tutorial.

Lecture 2 (2/19)

A.) List of public resources. ·

EBI list of services,
NCBI Site Map. ·
Kyoto Encyclopedia of Genes and Genomes - KEGG.
Deposite of genomes - ENSEMBL .
Examples of species-specific database: Flybase, wormbase.

B.) Navigate the web of information on your favorite gene (or IL6)

Search for the gene in Gene vs. Protein database

Observe the difference between all text search and advacned search

Pay attention to the multitude of links associated the Gene entry.

Compare the human "IL6" entry in the NCBI Gene database v.s. the EBI UniProt database.

Information on human IL-6 at Reactome ; the example of an interaction involving IL6.

C.) Web-based tools

picking QPCR primer for your gene at Primer 3. You may use a sequence of interest to you or the human IL6

Lecture 1 (2/17)

A.) Retrieve and save sequence file:

Try different view (format) of the same entry by selecting different "display settings".
Download the FASTA files of nucleotide and protein sequence into your local computer*.

*: make a folder such as "GMS6014" in your home fodler. Save all course-related files in this folder. Avoid space in folder and file names

Open the saved file using a text editor such as Notepad in Wondows and TextEdit in MacOS.

* consider using a dedicated text editor for bioinformatics projects. Such as NotePad++ (for Windos only) or Emacs (all major OS)

B.) Local storage of sequence files:

Install a text editor following the links above.
Use .doc or .rtf files for formatting and annotation of sequences. example (Right click the link and save in the "/GMS6014/test" folder)
Use .seq or .fasta to store raw sequences in fasta format for downstream analysis. example (Right click the link and save in the "/GMS6014/test" folder)
Change folder option to view file extension. Change association to always open .seq or .txt files in your designated text editor
Try to load the MySequence.txt to Webcutter or NEBCutter for identifying restriction sites; then try to load the MySequence.doc file.