HUMAN DEVELOPMENTAL BIOLOGY, University of Manchester, UK. This folder contains files produced by mapping sequence data to the human genome (Version: hg19). # SAMPLES -------------------------------------- Filenames begin with the name of the sample. e.g. Liver_2topHat_hg19_.unique.bed12.gz is from sample Liver_2 # ANALYSES --------------------------------- The analysis (e.g. mapping) pipeline is denoted by the central section of the filename. e.g. *topHat_hg19_.unique* is a dataset resulting from mapping to the human genome (hg19) using TopHat and retaining uniquely mapped reads. # TODO list pipelnies in another directory. # FILE-TYPES ------------------------------------- The different file name endings are for different types of files:- *.bw A UCSC BigWig file. This binary file is accessed remotely when using the UCSC track hub to view data. *.FIRST.bw A bigwig file showing only reads mapped to the chromosomal first strand ('+' / Watson). In the track hub viewed in Green. *.SECOND.bw A bigwig file showing only reads mapped to the chromosomal second strand ('-' / Crick). In the track hub viewed in Blue. *.bb A binary BigBed file (see https://genome.ucsc.edu/FAQ/FAQformat.html#format1.5) giving the location of features on the genome. Like *.bw, bigbed files are hosted here to be viewed by the UCSC track hub. e.g. multiTissue.novel.transcripts.2015.6251.bb 6251 novel transcripts *.bed12.gz A 'bed' format plain text file (gzip compressed) listing the position of mapped reads (one line per mapping). 'bed12' includes columns detailing the example unspliced reads: - chr1 13442 13543 D3YGT8Q1:262:C5T4HACXX:5:1213:12286:32070/1 50 - 13442 13543 255,0,0 1 101 0 chr1 13442 13543 D3YGT8Q1:263:C5R1AACXX:8:1107:20580:76414/1 50 - 13442 13543 255,0,0 1 101 0 chr1 14460 14561 D3YGT8Q1:263:C5R1AACXX:8:2302:9789:73113/1 50 - 14460 14561 255,0,0 1 101 0 chr1 14466 14567 D3YGT8Q1:263:C5R1AACXX:7:2307:9874:91923/1 50 - 14466 14567 255,0,0 1 101 0 chr1 14471 14572 D3YGT8Q1:262:C5T4HACXX:5:1212:14921:89361/1 50 - 14471 14572 255,0,0 1 101 0 chr1 14488 14589 D3YGT8Q1:262:C5T4HACXX:5:1315:6889:30695/1 50 - 14488 14589 255,0,0 1 101 0 columns are: chr, start, end, sequence name (from sequencing machine), topHat mapping quality (always 50 for uniquely mapped reads), strand, start, end, colour, blockCount, blockSizes, blockStarts (see https://genome.ucsc.edu/FAQ/FAQformat.html#format1) For the RNA-seq, read-pairs can be reconstructed from the sequence names. The two names end /1 and /2, respectively. example spliced reads:- chr1 787408 788069 D3YGT8Q1:263:C5R1AACXX:7:1213:3733:45416/2 50 + 787408 788069 255,0,0 2 82,19 0,642 chr1 787412 788073 D3YGT8Q1:262:C5T4HACXX:5:1109:13655:86089/2 50 + 787412 788073 255,0,0 2 78,23 0,638 chr1 787412 788073 D3YGT8Q1:262:C5T4HACXX:5:2103:14380:95672/2 50 + 787412 788073 255,0,0 2 78,23 0,638 chr1 787412 788073 D3YGT8Q1:263:C5R1AACXX:7:1209:10060:75058/2 50 + 787412 788073 255,0,0 2 78,23 0,638 chr1 787412 788073 D3YGT8Q1:263:C5R1AACXX:8:1212:9382:61309/2 50 + 787412 788073 255,0,0 2 78,23 0,638 chr1 787413 788074 D3YGT8Q1:262:C5T4HACXX:5:1215:11377:24343/2 50 + 787413 788074 255,0,0 2 77,24 0,637 chr1 787449 788110 D3YGT8Q1:262:C5T4HACXX:5:2113:4547:20426/2 50 + 787449 788110 255,0,0 2 41,60 0,601 chr1 787449 788110 D3YGT8Q1:263:C5R1AACXX:7:2210:11746:69521/2 50 + 787449 788110 255,0,0 2 41,60 0,601 Note that the last three columns give the precise locations of splicing events relative to the start (column 3) (see https://genome.ucsc.edu/FAQ/FAQformat.html#format1). DOWNLOADs Be sure to check the files have downloaded correctly by checking the md5sum values on your own machine Our md5sum values are in ./bed12.gz.md5sum For further project and sample information contact Prof. Neil Hanley (neil.hanley@manchester.ac.uk) For data queries, contact Dr Dave Gerrard (david.gerrard@manchester.ac.uk)