Display COMPLETE DOCUMENT Scroll Up Scroll DOWN MORE! TOP

How can I find bases 3134678 to 3136537 of the E. Coli genome?

Unfortunately there is no really convenient way to do exactly what you want. There are a couple of possible methods, but both require more effort than one would like.

Method 1: GCG's Lookup and Fetch programs
  • The E. Coli genome is divided into 400 sections, which are saved as 400 separate entries in the Genbank database. You can find the list of entries by using the GCG program 'lookup'. Select 'Genbank' as the database to search, type 'Coli & genome' next to 'All Text', and save the results. It will show that entry GB_BA1:ECAE000111 contains section 1 of 400 of the complete genome, and GB_BA1:ECAE000510 contains the last section, 400 of 400.

    This does not tell you which bases are contained in each entry. If from some other source (e.g. a paper, or the author) you know one entry that contains part of your region of interest, you can get the upstream and downstream bases by getting the preceding and following entries. For example, in this case entry ECAE000381 contains most of the region of interest. If you want a few hundred bases on either side of this entry, you would want entries ECAE000380 and ECAE000382. Note that this requires prior knowledge of at least one entry number. The entries themselves give no indication of which bases they include, they just say 'section 271 of 400' etc.

    In the lookup program, you could also search for 'U00096', which is the accession number for the entire genome. You can do this search through GCG-Lite+ on the web.

    You can obtain the actual entries by typing 'fetch ECAE000381' for each accession number at the helix prompt.

    Method 2: NCBI's Genome Query
  • An alternative way to obtain the sequence is by pointing your web browser at NCBI's Genome Query. Click on 'E. Coli' and you will get a basic circular figure showing the genome. Put your cursor as close to 3100K as possible, and click. You will get a figure showing 3100950..3150949, or something close to that. Use the forward and back arrows at the bottom of the figure to obtain a region that includes your region of interest. Then click on the 'TextView' button in the left frame. The main frame should now show an actual Genbank entry for bases 3100950..3150949

    Choose 'Save Frame As...' in your browser menu under File. Choose 'Text' as the format, and save the entry into a file on your desktop computer. This entry is in Genbank format, will need to be converted to GCG format for any GCG program. ou can use the helix program 'fmtseq' to convert it. Type 'fmtseq' at the helix prompt and follow instructions.

  • However the sequence is obtained, you will need to delete the extra bases on each end within seqed, and will have to keep track of the base numbering by yourself. Type 'seqed filename' to start up seqed. If you used Method 1, you will have two files to enter. Position your cursor at the end of the first sequence, hit Ctrl-D to get to the command prompt, and type 'Include filename2' to add the second sequence at the end of the first.