OmniMapFree - Design - blast data

Updated: 14 July 2011.

blast data

The blast data file is used to display genes or regions of a chromosome hit by a blast search.

BLAST against Gene or Protein Sequences

OmniMapFree can display BLASTN, BLASTP, BLASTX, TBLASTN and TBLASTX hits for searches against gene or protein sequences for your organism. The gene or protein ID is used to identify the chromosome coordinates of the gene displayed in OmniMapFree. The query ID and evalue are displayed in the Gene List Dialog box.

This is part of the data file for phi_e-100_hits.blast:

# clGreen
# e-100
# chrom	start	end	strand	name	hit	evalue

1	68133	69338	+	fg00006	PHI:257	3e-60	PHI:256	3e-14
1	69906	71427	-	fg00007	PHI:438	1e-06
1	83812	85367	-	fg00012	PHI:438	6e-30
1	123256	124136	-	fg00028	PHI:479	1e-66
1	142736	144998	-	fg12021	PHI:169	3e-06
1	145394	150432	-	fg00036	PHI:96	0.0
1	151490	152771	+	fg12022	PHI:438	3e-22

This data file was generated from the results of a BLASTP search of PHI-base protein sequences against a database of Fusarium graminearum proteins downloaded from MIPS.

The first #-line contains the colour (clGreen) used to display the genes or regions.

The second #-line gives the e-value cutoff - any data with e-values higher (less significant) than this are not displayed.

The third line is blank and is not essential but separates the #-lines from the data lines.

The following lines contain the data for each gene displayed. The first five data fields: chromosome id (chrom), start position (start) and end position (end), strand and gene id (name) are the same as in posn data files. After these are fields for query id and e-value.

The start and end positions are those for the gene encoding the protein that was hit by the query protein sequence.

If the e-value in field 7 of the data line is less than the e-value cut-off (in the second #-line) then OmniMapFree displays this feature otherwise it ignores the data line.

This means that only one gene in the data fragment above will be displayed:

1	145394	150432	-	fg00036	PHI:96	0.0

You can change the e-value cut-off in the second #-line of the blast data file to change how similar the sequences must be for them to be displayed.

There must be at least one query id + e-value pair. But there can be as many query id + e-value pairs as you want, however they should be in arranged in order of lowest to highest e-value so that OmniMapFree only has to check the first e-value (field 7) to decide whether to display a gene or feature.

This data line shows that fg00006 is hit by two different PHI-base proteins.

1	68133	69338	+	fg00006	PHI:257	3e-60	PHI:256	3e-14

We routinely do BLAST searches saving all results with e-values less than or equal to e-5 and process the results so that the OmniMapFree blast data file contains all of them. Then we can change the e-value cut-off in the data file to identify hits with as similar as we want to the query sequence. We normally use e-value cu-offs of e-100, e-40 and e-5.

Here is the map drawn by the phi_e-100_hits.blast data file:

BLAST against Chromosome Sequences

If you do a BLASTN search for DNA sequences against the chromosome sequences there are no gene IDs (name) so you can insert any value you want here e.g. "seq". The data line therefore represents a region of the chromosome rather than a gene. The start and end positions are those of the matching sequence within the chromosome.