Search Results

How do I import a sequence file into GCG?

Say you have a file, created in WP or Simple Text or Microsoft Word, or something that came out of an automatic sequencer, on your desktop machine. It's unlikely to be in GCG format, but you want to run some GCG programs on this sequence. There are a few different ways to convert it into the right format:

Cut-and-paste into Seqed

Log into helix from your desktop computer and start up GCG. Type 'seqed', and enter screen mode. In a separate window, open the file containing your sequence, select it, and paste it into the seqed window. Then exit seqed, saving the file which will now be in GCG format. Check that the entire sequence was cut/pasted and saved. This is a quick-and-dirty method that mostly works, but you have to be careful that you cut-and-paste the entire sequence. It is not useful for very long sequences or if you have many sequences.

GCG's Reformat program

Save the file as an ASCII text file. If the sequence has word-processor codes in it, no sequence conversion program will accept it.

Bring the file to helix using some file transfer program like Fetch or FTP.

Start up GCG (type 'gcg' at the helix prompt) and then run reformat (type 'reformat filename' at the helix prompt). Reformat should give you a GCG-formatted sequence in the same file. If something doesn't work right, see Reformat gave me an empty file or Reformat put the header into the sequence which may help you to troubleshoot.

Readseq

To use Readseq, the data must be in one of the following formats:

   *IG/Stanford, used by Intelligenetics and others
   *GenBank/GB, genbank flatfile format
   *NBRF format
   *EMBL, EMBL flatfile format
   *GCG, single sequence format of GCG software
   *DNAStrider, for common Mac program
   *Fitch format, limited use
   *Pearson/Fasta, common format used by FastA program and
             others
   *Zuker format, limited use. Input only.
   *Olsen, format printed by Olsen VMS sequence editor. Input
             only.
   *Phylip3.2, sequential format for Phylip programs
   *Phylip, interleaved format for Phylip programs (v3.3,
             v3.4)
   *Plain/ Raw, sequence data only (no name, document,
          numbering)
   *MSF multi sequence format used by GCG software
   *PAUP's multiple sequence (NEXUS) format
   *PIR/CODATA format used by PIR

If your data is 'raw' (i.e. it has simply sequence data with no headers, dividers etc.), then be aware that Readseq may not accept it correctly. Your simplest option is to convert it into Fasta format by adding this line to the top of the file:
>>test input sequence
Type 'man readseq' on helix for more information.

It is always worth checking the output file after Readseq! You don't need to examine the whole sequence, just check the beginning, end and length of the sequence. Readseq sometimes doesn't recognize headers properly and includes them in the sequence -- an easy error to notice.

GCG-Lite

GCG-Lite has a web-based format conversion tool that converts between the formats that Readseq uses. Paste your sequence into the input box, choose an output format, and click on 'Submit Request'. The reformatted sequence will appear in your web browser, where you can save it into a file on your own machine. Transfer this file to helix using a file transfer program such as Fetch or FTP.