Display COMPLETE DOCUMENT Scroll Up Scroll DOWN MORE! TOP

GCG doesn't accept my sequence data format

Your sequence data is not in GCG format. You can use cut-and-paste into Seqed, Readseq or GCG's Reformat to convert it into GCG format on helix, or use GCG-Lite's format conversion tool. These methods are described further below.

  • Cut-and-paste into Seqed
  • Log into helix from your desktop computer and start up GCG. Type 'seqed', and enter screen mode. In a separate window, open the file containing your sequence, select it, and paste it into the seqed window. Then exit seqed, saving the file which will now be in GCG format. Check that the entire sequence was cut/pasted and saved. This is a quick-and -dirty method, but is not useful for very long sequences or if you have many sequences.
  • Readseq
  • To use Readseq, the data must be in one of the following formats:
       *IG/Stanford, used by Intelligenetics and others
       *GenBank/GB, genbank flatfile format
       *NBRF format
       *EMBL, EMBL flatfile format
       *GCG, single sequence format of GCG software
       *DNAStrider, for common Mac program
       *Fitch format, limited use
       *Pearson/Fasta, common format used by FastA program and
                 others
       *Zuker format, limited use. Input only.
       *Olsen, format printed by Olsen VMS sequence editor. Input
                 only.
       *Phylip3.2, sequential format for Phylip programs
       *Phylip, interleaved format for Phylip programs (v3.3,
                 v3.4)
       *Plain/ Raw, sequence data only (no name, document,
              numbering)
       *MSF multi sequence format used by GCG software
       *PAUP's multiple sequence (NEXUS) format
       *PIR/CODATA format used by PIR
    
    If your data is 'raw' (i.e. it has simply sequence data with no headers, dividers etc.), then be aware that Readseq may not accept it correctly. Your simplest option is to convert it into Fasta format by adding this line to the top of the file:
    >>test input sequence
    Type 'man readseq' on helix for more information.

    It is always worth checking the output file after Readseq! You don't need to examine the whole sequence, just check the beginning, end and length of the sequence. Readseq sometimes doesn't recognize headers properly and includes them in the sequence -- an easy error to notice.

  • Reformat
  • In order to use Reformat on sequence files, the files must contain a heading, a dividing line, and a sequence. Type 'genhelp reformat' for more details on the input sequence format. It is a good idea to make a copy of your input sequence before running reformat, as it overwrites the original file. To run the program, type 'reformat filename', and if all goes well you should now have a GCG-formatted sequence in the file. If something doesn't work, see Reformat gave me an empty file or Reformat put the header into the sequence which may help you to troubleshoot.

  • GCG-Lite
  • GCG-Lite has a web-based format conversion tool that converts between the formats that Readseq uses. Paste your sequence into the input box, choose an output format, and click on 'Submit Request'. The reformatted sequence will appear in your web browser, where you can save it into a file.