Old Links Used in Class - Spring 2023

Lecture 13 Genome browser and genomic sequence analysis (3/10)


A.) Displaying genomic dataset with UCSC browser

  • UCSC Genome Browser is a popular server for viewing genomic data -- pay attention to select the right genome release. Sign up with the service will allow you to save the analysis sessions and share the results with others. Example
  • Download and install the IGB genome browser, which you can run on your own computer
  • An example of a saved session , which will persist and you can send the link to collaborators

  • B.) How to obtain genomic sequence surrounding your gene?

    * From genome browser - example
    * For gneomes not available in public genome browser - use Fastacmd or your own Pyhon script

    C.) Identifying TF and epigenetic regulator binding sites in the genome

  • Searching for TF binding sites in the DNA sequence using TFsiteScan or PROMO  

  • Lecture 12 Intro to R and machine learning (3/8)

  • The R project and R studio download site.
  • Demo R code: General ; for tidyverse ; and machine learning.



  • Lecture 11 RNA-Seq (3/6)

    A.) Identify differentially expressed genes (DEGs)

  • Observe the output of cufflinks.
  • Compile the assembly files for Whole Gut
  • Identify DEGs with a job file samples

  • A pipeline that streamline the whole process. Example


  • B.) Functional interpretation of HTS (RNA-Seq) data

    Our analysis of change of gene expression in old vs young tissues produced CuffDiff Output (save the file and view with excel, use sort to extract list of DEGs based on p/q value or the significance call)

  • Gene Ontology
  • Enrichment analysis . Try the g:profiler analysis with the list of significantly (p less than 0.0001) increased genes we generated with Cuff_Diff.

  • Lecture 10 RNA-Seq (3/3)


    A. RNA_seq data analysis overview.

  • Review paper 1 and 2 .
  • Examples of protocols: 1.) Tophat-->Cofflink--Cuffdiff ; 2.) hisat2-->StringTie. You could also make your own based on the needs of your project with available components

  • B.) Mapping to reference genome

    1. Check to see if the file size are right, run FASTQ QC if necessary.
    2. Map to the reference genome using the script/job file StarMap . Edit the script/job file with your text editor and load to your RNA-Seq directory.
    3. submit the job file
    4. Perform transcript counting with cufflinks job file
  • Intro for Star Aligner


  • Lecture 9 High throughput sequencing (HTS) data analysis (3/1)


    A.) Obtaining HTS dataset from GEO

  • Example - GSE62580 ; links for SRA dataset - SRP049144 - use the run selector to get accession list
  • using fastq-dump, you may choose to either download 4 samples of whole gut samples into a /wg folder within your working directory
  • for batch download, using a .sbatch job file - download the example ; change it to your email address. upload it to your folder and submit the job with "sbatch --account=gms6014 --qos=gms6014 [Filename]

  • Lecture 8 Protein struture, AlphaFold, and HiPerGator (2/27)


    A.) Predicting protein structure

  • Alphafold2 paper by Deepmind .
  • Introduction to the Alphafold project on the Deepming website.
  • PDB - Protein Data Bank ( wwPDB ), played a fundamental role in improving protein struture analysis.
  • B.) Running AF2 on Hipergator

    1. Obtain a fasta format file of the protein that you are interested to predict structure. Save and load it to your folder in /gms6014/share/YourName/AF2. (Prtoein file we use before.)
    2. Edit the Script file on your local computer:
      1. change email address on the #SBATCH --mail-user line.
      2. change the output folder name.
      3. change the name of the fasta formate file.
      upload it to the same AF2 folder as above.
    3. Open terminal connect to HiPerGaot. Navigate to your own AF2 folder, run the command "sbatch --account=gms6014 --qos=gms6014 AlphaFold2.sh".