Working with Non-Public Data
Step 1: Introduction
This tutorial demonstrates two different ways to manage private data in Genome Workbench.
- You have created your own sequence and want to work with it in Genome Workbench
- You want to view your own data/annotation on a publicly available sequence
We will demonstrate using some of the Genome Workbench tools on the data not found in the NCBI databases.
It is recommended that you complete Basic Operation tutorial first.
Here is a link to the sample data you will need to complete this tutorial - BX530088_BX572102.
Step 2: Getting Started
For the first exercise we are going to do the following:
- Load a user-generated AGP file (download sample)
- SPLIGN some mRNAs on that AGP sequence
- Create a FASTA file from the AGP
- BLAST that FASTA sequence to see what is related to it
- WindowMask that FASTA sequence (or part of it) to look for repetitive regions
Genome workbench starts up and displays the main screen. Choose File=>Open from the main menu, select File on the left side of the dialog, click the ... button on the right to point to the file location. Genome Workbench understands many different file formats and for this step choose BX530088_BX572102.comp.agp from the data files downloaded. Click Next and then Next again to accept the defaults. Then click Finish to add the data file to a new project.
Now that your data is loaded, you can view it by selecting the data in the project tree, right clicking and choosing Open New View. Then choose Graphical View. While this is not very interesting you can zoom in to see the sequence.
Step 3: Apply the tool to private data
Now let us align an mRNA to our sequence. We will use the SPLIGN tool. SPLIGN (or SPLiced Aligner) is a global alignment tool used in NCBI's annotation pipeline. Open the NM_020137.3 RID from the Data from GenBankdatabase (File=>Open) and add it to the project.
Click Next and Finish. Both entries are now shown in the data folder.
Select both entries (SHIFT+left click in both MS Windows and Mac OS). With both entries selected click Tools=>Run Tool to open the Tools dialog and choose SPLIGN and Next. In some systems you will be taken to the next screen even without having to choose Next
Select BX530088... for the Genomic Sequence and NM_020137.3 for the Transcript Sequence. If you do not see both sections of the dialog you need to drag down the lower border of the dialog box.
Click Next.
Add the results to the existing project and click Finish.
Your private data alignment will be displayed.
Step 4: Export a FASTA file
Select the data file in the Project Tree View we loaded previously. Right click (control click in the Mac OS) on the selected data and choose Export. Select FASTA as the format, select a location, and give the file a name.
Click Finish.
Now open the FASTA file you have just created. Choose File=>Open. Select the file and click Next. Accept the default settings and click Next again. Choose to create a new project and click Finish.
Select the FASTA data in the Project Tree View and double click it. From the Open View menu choose Graphical View.
Step 5: Alignment
From the Graphical View of the FASTA sequence use region selection to select the entire sequence. Click and drag in the number line at the top of the view to begin the selection.
Once you have a region selected, click on the edges and stretch it to the boundaries of the view.
With the entire region selected, choose Run Tool (Tools=>Run Tool from the main menu, or Right Click (control-click on the Mac OS)). From the Run Tool dialog choose BLAST Search.
Click Next.
In the BLAST Search dialog ensure you have selected the Nucleotide option, Nucleotide-Nucleotide (MegaBLAST) from the Program menu, and nt(All GeneBank+EMBL+DDBJ+PDB sequences) from the Database menu. Input biomol mrna[prop] search string into the Entrez Query field.
Click Next
From the next dialog, accept the general parameters and check the Filter low complexity regions and select Human from the Species specific repeats for: menu.
Then click Next. In the next screen choose to add the results to the existing project (New Project (1)) and click Finish.
It can take some time for the analysis to return and present the results.
Step 6: WindowMasker
In this step we will use WindowMasker on the FASTA sequence to look for repetitive regions. First let us upload the mask. Select Tools=>WindowMasker Data. In the dialogue that appears choose the location (path) to download the mask, choose human.tar.gz as the mask.
Click OK.The mask data will be downloaded to the selected location.
The FASTA file should still be available in the project tree view. Select it, double click and open a graphical view. Select the region by clicking in the number line and dragging a selection around a region.
Choose Tools=>Run Tool from the main menu.
Select Search/Find Repetitive Sequences with WindowMasker and click Next (in some systems you might have to only click the tool without having to click Next).
Ensure that our sequence is selected (BX530088...), select 9606 Homo sapiens from the Mask using parameters for menu.
Click Next. Choose a project to add the results to and click Finish. It can take some time for the job to complete.
The result is a histogram showing regions of repeats. You can scroll and zoom just like you would any other view.
If the histogram does not appear automatically, select the content menu at the bottom of the graphical view and choose Repeat Region.
Step 7: Conclusion
There are multiple ways to use Genome Workbench and this only shows some very simple examples. It gives you enough background to start exploring your data in new and interesting ways. It gives you the privacy you need along with the access to public data desired.
Current Version is 2.12.10 (released August 20, 2018)
Downloads
- FTP site for all downloads
- Windows
- Mac OS X 10.10+ (Yosemite, El Capitan, Sierra, High Sierra)
- Linux (Ubuntu 18.04 "Bionic Beaver")
- Linux (Ubuntu 16.04 "Xenial Xerus")
- Linux (OpenSUSE Leap 42.3)
- Linux (Fedora 28)
- Linux (Fedora 27)
- Linux (Debian 9 "Stretch")
- RPM Installation key
- Source
- Older Versions
General
Help
Tutorials
- Basic Operation
- Using Active Objects Inspector
- Working with Non-Public Data
- Working with Multiple Views
- Broadcasting
- Genes and Variation
- Generating Sequence Overlap Alignments
- Working with BAM Files
- Loading Tabular Data
- SNP Table View
- Sequence View Markers
- Projects and Workspaces
- Publication quality graphics (PDF image export)
- Create Protein Alignments using ProSplign
- Exporting BAM/cSRA Coverage Graphs as WIG Files
- GFF-CIGAR export for alignments
- Exporting Tree Nodes to CSV
- Generic Table View
- Running BLAST search against custom BLAST databases
- Using Phylogenetic Tree
- Coloring methods in Multiple Alignment View
- Displaying translation discrepancies
- Graphical View Navigation and Manipulation
- Video Tutorials
Manuals
- Tree Viewer Formatting
- Tree Viewer Broadcasting
- Genome Workbench Macro
- Query Syntax in Genome Workbench and Tree Viewer
- Multiple Sequence Aligners