The tutorial provides a short introduction to Fast5 files used to store raw data output of Oxford Nanopore Technologies' sequencing devices. The tutorial aims to provide background information for why users may have cause to interact with Fast5 files and show how to perform common manipulations.
Methods used in this tutorial include:
ont_fast5_api for manipulating read information within Fast5 files.The computational requirements for this tutorial are:
⚠️ Warning: This notebook has been saved with its outputs for demostration purposed. It is recommeded to select
Edit > Clear all outputsbefore using the notebook to analyse your own data.
This tutorial aims to elucidate the information stored within a Fast5 file, and how such files can be read, or parsed, within the Python programming language and on the command line.
The goals from this tutorial include:
ont_fast5_api,The tutorial includes a sample Fast5 dataset from a metagenomic sample.
Before anything else we will create and set a working directory:
from epi2melabs import ping
tutorial_name = "fast5_tutorial"
pinger = ping.Pingu()
pinger.send_notebook_ping('start', tutorial_name)
# create a work directory and move into it
working_dir = '/epi2melabs/{}/'.format(tutorial_name)
!mkdir -p "$working_dir"
%cd "$working_dir"
/epi2melabs/fast5_tutorial
This tutorial uses the ont_fast5_api software; this is not installed in the default EPI2ME Labs environment. We will install this now in an isolated manner so as to not interfere with the existing environment.
Please note that the software installed is not persistent and this step will need to be re-run if you stop and restart the EPI2ME Labs server.
# create a conda environment and install ont_fast5_api into it
!conda remove -y --name ont_fast5_api --all
!conda create -q -y -n ont_fast5_api python==3.6 pip 2>/dev/null
!. /opt/conda/etc/profile.d/conda.sh \
&& conda activate ont_fast5_api \
&& which pip \
&& pip install "ont_fast5_api>=3.1.6"
In order to provide a concrete example of handling a Fast5 files this tutorial is provided with an example dataset sampled from a MinION sequencing run: the dataset is not a full MinION run in order to reduced the download size.
To download the sample file we run the linux command wget. To execute the command click on the cell and then press Command/Ctrl-Enter, or click the Play symbol to the left-hand side.
bucket = "ont-exd-int-s3-euwst1-epi2me-labs"
domain = "s3-eu-west-1.amazonaws.com"
site = "https://{}.{}".format(bucket, domain)
site = "https://ont-exd-int-s3-euwst1-epi2me-labs.s3-eu-west-1.amazonaws.com"
!rm -rf sample_fast5
!wget -O sample_fast5.tar $site/fast5_tutorial/sample_fast5.tar
!tar -xvf sample_fast5.tar
!wget -O fast5_sample.bam $site/fast5_tutorial/fast5_sample.bam
!wget -O fast5_sample.bam.bai $site/fast5_tutorial/fast5_sample.bam.bai
Having downloaded the sample data we need to provide the filepaths as input to the notebook.
The form can be used to enter the filenames of your inputs.
input_folder = None
output_folder = None
def process_form(inputs):
global input_folder
global output_folder
input_folder = inputs.input_folder
output_folder = inputs.output_folder
# run a command to concatenate all the files together
!cecho ok "Making output folder"
!mkdir -p "$output_folder"
!test -d "$input_folder" \
&& cecho success "Found input folder." \
|| cecho error "Input folder does not exist."
!echo " - Found "$(find "$input_folder" -name "*.fast5" | wc -l)" fast5 files"
from epi2melabs.notebook import InputForm, InputSpec
input_form = InputForm(
InputSpec('input_folder', 'Input folder', '/epi2melabs/fast5_tutorial/sample_fast5'),
InputSpec('output_folder', 'Output folder', 'analysis'))
input_form.add_process_button(process_form)
input_form.display()
VBox(children=(HBox(children=(Label(value='Input folder', layout=Layout(width='150px')), interactive(children=…
Executing the above form will have checked the input folder attempted to find Fast5 files located in the folder.
Fast5 files are used by the MinKNOW instrument software and the Guppy basecalling software to store the primary sequencing data from Oxford Nanopore Technologies' sequencing devices and the results of primary and secondary analyses such as basecalling information and modified-base detection.
Before discussing how to read and manipulate Fast5 files in Python we will first review their internal structure.
Files output by the MinKNOW instrument software and the Guppy basecalling software using the .fast5 file extension are a container file using the HDF5 format. As such they are a self-describing file with all the necessary information to correctly interpret the data they contain.
A Fast5 file differs from a generic HDF5 file in containing only a fixed, defined structure of data. This structure is elucidated in the ont_h5_validator repository on Github, specifically in the file multi_read_fast5.yaml.
Users are referred to the YAML schemas to gain an understanding of all the data contained in Fast5 files. Users are encouraged to raise Issues on the ont_h5_validator project if the schemas are unclear. The rest of this tutorial will be mostly practical in nature.
The schema file describes how the internal structure of a Fast5 file is laid out. There are three core concepts to understand:
An appreciation of these concepts is required for using the data contained within Fast5 files, though as we will see for common manipulations of Fast5 files users need only an awareness of these ideas.
Historically there have been two flavours of Fast5 files:
The internal layout, in terms of groups and datasets, of these two flavours of Fast5 are very similar. In essence a multi-read file embeds the group hierarchy of multiple single-read files within one HDF5 container.
Single-read files are deprecated and no longer used by MinKNOW or Guppy. We recommend that any single-read files are converted to multi-read files before further use or storage, how to do this is demonstrated later in this tutorial.
As noted above the ont_h5_validator project contains a full description of the expected contents of a Fast5 file. Here we will briefly highlight the key groups and datasets stored within a Fast5 file.
Using the dataset provided in above let's enumerate the contents of the first file using the h5ls program:
# i) find and list all .fast5 files
# ii) take the first file
# iii) use `h5ls` to list the file's contents
# iv) truncate the output to the first 19 lines
!find "$input_folder" -name "*.fast5" \
| head -n 1 \
| xargs h5ls -r \
| head -n 19
If you can’t find a trustworthy download, you can mimic Aquamarine in-camera using Canon’s built-in settings:
The studio was silent, save for the low, steady hum of the server towers. Elara sat before the terminal, the blue light of the monitor washing over her face. She wasn't here for the latest firmware update or the security patches. She was here for the legend.
They called it the "Aquamarine Profile."
In a world oversaturated with high-contrast neon and artificial sharpening, the Aquamarine profile was a myth among photographers—a rumored set of color science values allegedly developed by Canon’s lost experimental division. It was said to replicate the exact hue of the deep ocean, a tone that felt like looking through a sheet of polished glass.
"Initialize download," Elara whispered, her finger hovering over the mechanical keyboard.
She typed the command: retrieve_canon_aquamarine_v4.2.
The progress bar appeared, a thin sliver of grey. Then, it happened. The screen didn't just light up; it seemed to liquefy. The harsh white pixels of the interface softened, bleeding into a stunning, translucent teal.
[DOWNLOADING: PICTURE STYLE: AQUAMARINE]
The transfer was slow, agonizingly so. Each percentage point that ticked by seemed to drop the temperature in the room. 10%. 20%. The air tasted salty, metallic—like holding a battery on your tongue while standing on a pier.
At 50%, Elara glanced at the camera sitting on the desk—a vintage Canon DSLR she had modified specifically for this. She connected the cable. The camera’s top LCD panel flickered. Usually, it displayed shutter speed and aperture in crisp green numbers. Now, the numbers dissolved, replaced by a pulsing, rhythmic wave of cyan light.
"Come on," she breathed.
The download wasn't just data; it was atmosphere. The harsh shadows in the corners of the studio seemed to fill with water. Not real water, but the idea of water. The light in the room lost its yellow warmth, turning cool and crystalline. It was the visual equivalent of taking a deep breath of cold, fresh air.
[VERIFICATION IN PROGRESS]
The terminal beeped. A warning flashed: COLOR TEMPERATURE EXTREME. SATURATION LEVELS: DEEP OCEAN. PROCEED?
Elara didn't hesitate. She hit ENTER.
[INSTALL COMPLETE]
She grabbed the camera. It felt heavier now, dense. She switched the dial to 'Picture Style.' Usually, the menu offered Standard, Portrait, Landscape. She scrolled down.
There it was. An icon of a wave inside a lens.
Aquamarine.
She raised the camera to her eye and pointed it at a mundane subject—a half-empty glass of water sitting on a dusty wooden table. She pressed the shutter. The mirror slapped down, a decisive, heavy clack.
Elara pulled the camera away and looked at the preview screen.
The image glowed. The water in the glass wasn't just clear; it was a dense, gemstone blue, holding light like a trapped jewel. The dust motes floating in the air looked like bioluminescent plankton drifting in a dark sea. The shadows weren't black; they were deep indigo, velvet soft and endless. It didn't look like a photo of a room. It looked like a memory of a dream.
It was cold, but not lifeless. It was serene. It was the color of calm.
Elara smiled, the blue light reflecting in her eyes. She had captured the tide. She scrolled to the next file, ready to dive in.
Aquamarine Picture Style is a custom, downloadable preset for Canon cameras
designed to produce vibrant, cool-toned images—shifting blues toward a turquoise or teal hue. It is widely popular among photographers in Southeast Asia for achieving a "cinematic" or "tropical" look directly in-camera. The Story of the "Lost Lagoon" Look
For years, travel photographers struggled to capture the exact shade of the crystal-clear waters in remote tropical islands. Standard camera settings often turned the water a flat, dark blue, failing to reflect the "aquamarine" magic seen by the human eye. Aquamarine Picture Style
emerged as a "secret recipe" shared among enthusiasts to solve this. By installing this custom file (usually a
format) into one of the camera's "User Defined" slots, photographers could suddenly see their world transformed through the LCD screen. The Effect
: It breathes life into ocean scenes, making the water pop with a bright, glassy teal while keeping skin tones soft and natural. The Workflow
: Instead of spending hours in post-processing, users download the style, register it via Canon EOS Utility download picture style canon aquamarine
, and capture "ready-to-post" photos that look like they were pulled from a high-end travel magazine. How to Use It Picture Style - Википедия
The Aquamarine Picture Style for Canon is a specialized custom color profile designed to give images a cool, teal-leaning cinematic look. While not an official Canon preset like "Standard" or "Landscape," it is widely used by photographers to achieve a "summer" or "oceanic" vibe. 1. Downloading and Software Setup
To use custom picture styles like Aquamarine, you must first have the correct Canon management software.
Picture Style Editor: Use the Picture Style Editor to view or modify .pf2 or .pf3 files.
EOS Utility: Use the EOS Utility to transfer the downloaded file from your computer to your camera. 2. How to Install on Your Camera
Follow these steps to register the Aquamarine style to your Canon EOS camera:
Download: Save the Aquamarine .pf2 or .pf3 file to a known folder on your computer.
Connect: Plug your camera into your computer via USB and turn it on.
Launch EOS Utility: Open the software and select Camera settings > Register Picture Style File.
Select Slot: Choose one of the User Defined slots (User Def. 1, 2, or 3).
Upload: Click the folder icon, navigate to your Aquamarine file, and select Open. Click OK to save it to the camera.
Activate: Disconnect the camera. Press the Menu or Picture Style button on the camera body and select your new profile to start shooting. 3. Characteristics of the Aquamarine Look
Custom profiles like Aquamarine typically manipulate several key parameters to achieve their specific aesthetic: Make your own Canon Picture Style for FREE! | Canon M50
Aquamarine Picture Style is a popular custom profile for Canon DSLR and mirrorless cameras, primarily used to achieve a vibrant, cinematic look with an emphasis on cool, teal tones in the shadows and glowing highlights. Unlike standard Canon presets like "Landscape" or "Faithful," Aquamarine is typically a third-party or user-created file that must be manually installed. Aquamarine Picture Style Review Using Canon's Picture Styles - Digital Photography School
Once you have the .PF2 file (the Picture Style file), follow these steps:
Add a one-click “Download Picture Style: Canon Aquamarine” feature that lets users quickly apply Canon’s Aquamarine color profile (a cooler, teal-cyan palette optimized for underwater and coastal scenes) to RAW and JPEG photos, and download the styled files in preferred formats.
Don’t own a Canon camera? Or want to test the look before installing? You can apply the Aquamarine Picture Style to Canon CR2/CR3 RAW files using Canon Digital Photo Professional (DPP) – free for Canon users.
Steps:
Now that you know where to download the Canon Aquamarine Picture Style, how to install it, and how to tweak it for your needs, it’s time to get shooting. Load it onto your camera, head outside on a sunny day with blue skies or a pool scene, and watch your JPEGs transform into works of art.
Have you used the Aquamarine style? Share your results in the comments below.
Keywords used naturally: download picture style canon aquamarine, Canon Aquamarine Picture Style, Canon .PF2 file, install custom picture style, Canon DPP.
The "Aquamarine" Picture Style is a custom color profile for Canon cameras designed to produce vibrant teal and turquoise tones, specifically targeting the blues of the sea and sky While Canon offers a similar official profile called
, many "Aquamarine" styles are community-created custom files (with
extensions) shared on photography forums or via tutorial creators. How to Install the Aquamarine Style
To use this style, you must download the file to a computer and transfer it to your camera using official Canon software. Install Film Simulations on your Canon
The Aquamarine Picture Style is a popular custom profile for Canon cameras, primarily used to create a cinematic look with a focus on vibrant, airy blues. It is widely used by underwater photographers and travelers to enhance aquatic tones and outdoor landscapes. Key Features & Visual Style
Vibrant Blue Enhancement: Specially tuned to render bright, vivid aerial and aquatic blues, making it ideal for sea and sky shots.
Cinematic "Teal & Orange" Vibe: Produces a high-contrast, professional look by emphasizing aquamarine hues while often maintaining skin tones or warm highlights.
Enhanced Clarity: Typically involves increased sharpness and contrast settings to make fine details pop in clear water or sunny environments. How to Download and Install
You can find the official Aquamarine file on the Canon Global Picture Style download page. Installation Steps: If you can’t find a trustworthy download, you
Connect: Plug your camera into your computer via USB and launch the EOS Utility software.
Register: Select Camera Settings > Register Picture Style File.
Upload: Click a User Defined slot (1, 2, or 3) and select the .pf2 or .pf3 file you downloaded.
Confirm: Click OK to save the style to your camera. It will now appear in your camera's Picture Style menu for use. Picture Styles - Canon Europe
The Ultimate Guide to the Canon Aquamarine Picture Style : Download and Installation
Finding the perfect color profile can transform your photography from standard to cinematic without hours of post-processing. While Canon provides legendary presets like "Landscape" and "Portrait," many creators seek the unique, cool-toned aesthetic of the Canon Aquamarine Picture Style .
This guide explains what the Aquamarine style does, why it’s a favorite for travel photographers, and how you can download and install it on your Canon EOS camera. What is the Canon Aquamarine Picture Style?
The Aquamarine Picture Style is a custom color profile (typically a .pf2 or .pf3 file) designed to enhance the blues and teals in an image while maintaining natural skin tones. Unlike the standard Landscape Picture Style, which boosts all saturation, Aquamarine focuses on:
Sea and Sky Optimization: It shifts blue hues toward a cyan/teal spectrum, making water look like a tropical coral reef.
Luminous Highlights: It often adds a subtle brightness to light-blue areas, creating a "glowing" effect.
Cooler Shadows: The profile introduces a slight cooling effect in the shadows, perfect for beach, pool, or coastal photography.
This style is often compared to the official Canon Emerald Style, but it typically offers a softer, more ethereal look rather than the high-contrast "pop" of Emerald. Why Use Custom Picture Styles?
Using custom styles like Aquamarine allows you to see your creative vision directly on your camera's LCD.
Save Time: Get the "look" in-camera so you can share JPEGs immediately to social media.
Consistency: Maintain the same color palette across an entire shoot.
Creative Preview: Even if you shoot in RAW, a Picture Style helps you visualize the final edit while you're still on location. How to Download and Install Canon Aquamarine
To get this look on your camera, you need to download the style file and use Canon's official software to transfer it. Step 1: Download the Style File
Custom styles are often shared by professional photographers or hosted on enthusiast sites. Ensure you are downloading a file with a .pf2 or .pf3 extension.
Look for reputable creators like Thomas Fransson or Le Hung Photography who offer free or premium custom styles. Step 2: Prepare Your Software
You will need the Canon EOS Utility installed on your computer. This software acts as the bridge between your PC/Mac and your camera. You can find the latest version on the Canon Support Page. Step 3: Registration and Installation Picture Styles - Canon Europe
Enhance Your Shots with the Canon Aquamarine Picture Style Picture Styles on Canon cameras act like digital film stocks, defining how your camera interprets color, contrast, and sharpness before you even take the shot. Among the diverse range of available options, the Aquamarine style is a popular custom choice for photographers looking to achieve a cool, cinematic, or "teal and orange" aesthetic directly in-camera. What is the Aquamarine Picture Style?
Unlike built-in presets like "Standard" or "Landscape" that prioritize general vibrancy, the Aquamarine profile specifically shifts color tones toward the cyan/blue spectrum. This makes it particularly effective for:
Coastal and Beach Photography: Enhancing the clarity and depth of water and sky.
Cinematic Portraits: Providing a modern, stylized look with cool shadows.
Architecture: Adding a clean, clinical, or futuristic feel to glass and steel structures.
Because these styles are applied directly to JPEGs and MP4 video files, they save significant time in post-processing. How to Download and Install
To use the Aquamarine style, you must first obtain the .pf2 or .pf3 file and then "register" it to your camera’s internal memory using your computer. 1. Download the File
While Canon offers several official expansion styles on the Canon Global Picture Style website, specific artistic styles like "Aquamarine" are often found through specialized photography communities or creator sites. Ensure you download the version compatible with your camera's age (newer models typically use .pf3). 2. Connect Your Camera
Install the latest version of the Canon EOS Utility on your computer.
Connect your camera to the computer using a high-quality USB cable. Canon Aquamarine Picture Style
Turn on the camera and set the mode dial to a Creative Zone (such as P, Tv, Av, or M). 3. Register the Style Install Film Simulations on your Canon
To capture the serene, crystal-clear aesthetic of tropical waters, you can utilize the "Emerald" Picture Style
provided by Canon, which is specifically designed to render sea colors in coral reefs vividly and brightly. Alternatively, you can use the "Landscape" setting to naturally enhance blues and aquamarine tones. Where to Download "Emerald" Picture Style You can download official custom files directly from the Canon Picture Style File Download : These styles typically use extensions. Emerald Look
: It specializes in emerald green and vivid aerial landscapes, making sky and pool water appear exceptionally bright. How to Install and Use the Style
There are two primary ways to apply these styles: in-camera for immediate results or during post-processing. 1. Uploading to Your Camera (via EOS Utility)
This allows you to shoot JPEGs with the aquamarine look applied immediately. Get my new Picture Styles for Free (Canon)
Using custom Picture Styles like "Aquamarine" (often associated with vibrant, cool-toned water or sky looks) allows you to apply unique color grades directly to your camera's JPEGs or videos
. This guide explains how to find, download, and install these files to transform your Canon camera's output. 1. Where to Find the "Aquamarine" Style
While Canon provides official styles like "Emerald" (for vivid aerial/water shots) and "Twilight," many creative looks like "Aquamarine" are available through third-party creators or official regional sites. Official Canon Site: Canon Global Picture Style Page
for professional-grade files like Nostalgia, Clear, and Emerald. Third-Party Creators: Many photographers share custom
files for specialized looks (like Kodak film simulations or specific color shifts) on platforms like Le Hung Photography or via YouTube descriptions. Canon Global 2. Download and Preparation Identify the File: Ensure you are downloading a file with the extension Software Needed: You must have Canon EOS Utility
installed on your computer. You can download it for free from the Canon Support Site by entering your camera model. 3. Step-by-Step Installation Guide
To get the Aquamarine look onto your camera, follow these steps: Connect Your Camera:
Use a high-quality USB cable to connect your Canon camera to your computer. Open EOS Utility: Power on your camera and launch the EOS Utility software. If prompted, select your camera model. Navigate to Settings: Camera settings , then select Register Picture Style File Select a User Slot: Canon cameras typically have three custom slots labeled User Def. 1, 2, and 3
. Choose the slot where you want to store the Aquamarine style. Load the File: Folder/Arrow icon , browse your computer for the downloaded file, and click Confirm and Save: to register the style. You can now disconnect your camera. Canon Czech Picture Styles - Canon Portugal
Canon’s "Aquamarine" (officially often called Emerald) picture style is a specialized custom profile designed to transform coastal and aquatic scenes into vibrant, professional-grade imagery. By emphasizing specific cyan and blue tones, it creates the bright, "tropical" look typically found in high-end travel magazines. Why Use the Aquamarine (Emerald) Style?
Standard camera settings often struggle with the vast range of blues in ocean or pool photography, resulting in dull or overly dark water. The Aquamarine/Emerald style addresses this by:
Vivid Sea Tones: Specifically boosts the color of coral reefs and emerald-green waters to make them appear more brilliant.
Brightened Skies: Enhances the blue of the sky and aquariums without making the overall image look unnaturally dark.
In-Camera Efficiency: Applies these complex color adjustments directly to JPEGs or video files, significantly reducing the need for post-production. How to Download the Picture Style
You can download the official .pf2 or .pf3 files from Canon’s regional support pages:
Official Source: Visit the Canon Global Picture Style Download Page.
Locate the File: Look for the "Emerald" style, which is the official name for this aquamarine-enhancing profile.
Check Compatibility: Ensure you have the latest version of Digital Photo Professional or EOS Utility to handle the file installation. Installation Guide: Loading Styles to Your Camera
To use the Aquamarine style while shooting, you must register it into one of your camera's three "User Defined" slots.
Connect Your Gear: Plug your Canon camera into your computer using a USB cable and turn it on.
Launch EOS Utility: Open the EOS Utility software on your computer. Register Style: Navigate to Camera settings > Register Picture Style File.
Select a slot (e.g., User Def. 1) and click the folder icon to locate your downloaded file. Click Open and then OK to sync it to the camera.
Shooting: Disconnect the camera. You can now select the new style through your camera’s regular Picture Style menu. Tips for Best Results Picture Styles - Canon Europe
The Fast5 files from a MinION run can become fairly sizeable, up to a few hundred gigabytes. Efficient and performant compression and indexing is therefore required.
For the most part the self describing and indexed nature of the HDF5 format ensures that data within a file can be quickly retrieved. However for a MinION run multiple Fast5 files are created each with a subset of the sequencing reads produced by the sequencer. Therefore finding the information pertaining to a read of a known ID cannot be done without a supplementary index cross-referencing the reads contained within in file; the alternative is to open all the files in turn and enquire about their contents. *The sequencing_summary.txt file produced by both MinKNOW and Guppy provides an index of the reads contained within in each Fast5 file*. This index can of course be reconstructed if required (as in the case of nanopolish index), though we recommend always storing the sequencing summary with the Fast5 data files.
Due to the large volume of data created by nanopore sequencing devices Oxford Nanopore Technologies has developed a bespoke compression scheme for ionic current trace data known as VBZ. VBZ is a combination of two open compression algorithms and is itself open and freely available from the Github release page. Ordinarily it will not be necessary to install the VBZ compression library and HDF5 plugin to simply use MinKNOW and Guppy as these software applications include their own copy of VBZ. However if you wish to read Fast5 files using third party applications (such as h5py) you will need to install the VBZ plugin.
The section above has given an outline to the data contained within a Fast5 file and how the file is arranged. Again for a more fulsome description of the contents of files users are directed to the ont_h5_validator project. In this section we will highlight several methods for manipulating the data contained within Fast5 files.
Oxford Nanopore Technologies provides a Python-based software for accessing data stored within a set of Fast5 files: ont_fast5_api. For the most part this set of tools hides from the user the need to understand anything about the nature of Fast5 files. Here we will show how to perform some common tasks that might be required when dealing with Fast5 files. For a guide in using ont_fast5_api programmatically please see the documention.
Since some older programs have not been updated to use multi-read files it can sometimes be necessary to convert such files to the deprecated single-read flavour. To do this run:
!rm -rf $output_folder/single-reads
!run multi_to_single_fast5 \
--input_path $input_folder --save_path $output_folder/single-reads \
--recursive
The output of the above command is a set of folders each containing a subset of the sequencing reads, one read per file. The filename of each read corresponds to the read's unique identifier.
!ls $output_folder/single-reads/0 2>/dev/null | head -n 5
00058fe1-e555-4a64-a41b-7f58fb7d6d6b.fast5 000dd482-c0d5-4520-aa86-8ee8bb61fd58.fast5 00158d74-4b7f-445a-b0ac-e1606f6c09b7.fast5 004a0bd2-edcf-4c2c-89bc-009a232cdb6a.fast5 0057b9d1-e566-4518-8b81-f69b30c6da99.fast5
A similar program exists to convert single-read files to multi-read files. We recommend that all datasets are updated to multi-read files for longer term storage. Here we will convert the single-reads created above back to multi-read files:
!rm -rf $output_folder/multi-reads
!run single_to_multi_fast5 \
--input_path $output_folder/single-reads --save_path $output_folder/multi-reads \
--filename_base prefix --batch_size 8000 --recursive
| 3 of 3|####################################################|100% Time: 0:00:55
The output of this command is a single directory containing all multi-read files. The filenames are prefixed with prefix as taken by the --filename_base argument of the program. The --batch_size argument here controls the number of reads per file:
!ls $output_folder/multi-reads
filename_mapping.txt prefix_0.fast5 prefix_1.fast5 prefix_2.fast5
The filename_mapping.txt cross-references the data from the input files with the output files.
!head $output_folder/multi-reads/filename_mapping.txt
26cb0f7d-8db2-4e2d-aa4e-9d273ccf1d66.fast5 analysis/multi-reads/prefix_0.fast5 b4441e24-a5d3-4357-bc24-4a169520d096.fast5 analysis/multi-reads/prefix_0.fast5 5d63b4ae-e9c7-43cb-b73c-7b3bc7facd57.fast5 analysis/multi-reads/prefix_0.fast5 5880c8b8-5c67-45cd-9082-2be09a7fc1d4.fast5 analysis/multi-reads/prefix_0.fast5 77d557c6-2154-4792-ad2d-49c9ca5f4bdd.fast5 analysis/multi-reads/prefix_0.fast5 afa10699-8648-4e7a-8bec-86118f202e8d.fast5 analysis/multi-reads/prefix_0.fast5 fb15566d-370c-478e-a190-d4221407e500.fast5 analysis/multi-reads/prefix_0.fast5 34465bd4-2335-4390-8675-daef5390ea79.fast5 analysis/multi-reads/prefix_0.fast5 67b3c07c-c4db-40e9-a18b-c10c8eeb70f5.fast5 analysis/multi-reads/prefix_0.fast5 133ac0a7-54d4-4681-8653-49b174fe6e7c.fast5 analysis/multi-reads/prefix_0.fast5
As mentioned in the discussion above it can be useful to have an index of which reads are contained within which multi-read files. Usually this indexing is provided by the sequencing_summary.txt file output by MinKNOW and Guppy. However if it is lost, here's a way to recover the information:
# build a script that will do the work
with open("build_read_index.sh", 'w') as fh:
fh.write(
'''
echo -e "filename\tread_id"
find $1 -name "*.fast5" \\
| parallel --tag h5ls -f -r \\
| grep "read_.\{8\}-.\{4\}-.\{4\}-.\{4\}-.\{12\} Group" \\
| sed "s# Group##" | sed "s#/read_##"
''')
# run the script
!bash build_read_index.sh $input_folder > read_index.txt
The read_index.txt output file contains the simple index we desire:
!head read_index.txt
filename read_id /epi2melabs/fast5-tutorial/sample_fast5/workspace/FAK42335_2bf4f211a2e2d04662e50f27448cfd99dafbd7ee_400.fast5 00085dbe-217a-40f2-90c0-3bb15669f32c /epi2melabs/fast5-tutorial/sample_fast5/workspace/FAK42335_2bf4f211a2e2d04662e50f27448cfd99dafbd7ee_400.fast5 00237911-92b3-49b4-9d13-2ea6a2ded996 /epi2melabs/fast5-tutorial/sample_fast5/workspace/FAK42335_2bf4f211a2e2d04662e50f27448cfd99dafbd7ee_400.fast5 0025338c-3ea8-4168-b999-fe7f7fd597ee /epi2melabs/fast5-tutorial/sample_fast5/workspace/FAK42335_2bf4f211a2e2d04662e50f27448cfd99dafbd7ee_400.fast5 00408494-e245-401e-8c9a-575ee491971b /epi2melabs/fast5-tutorial/sample_fast5/workspace/FAK42335_2bf4f211a2e2d04662e50f27448cfd99dafbd7ee_400.fast5 00485ea4-a2fc-4b75-9969-9f1b1ab997da /epi2melabs/fast5-tutorial/sample_fast5/workspace/FAK42335_2bf4f211a2e2d04662e50f27448cfd99dafbd7ee_400.fast5 004fbd46-3565-4505-8ade-bfa5bffa499b /epi2melabs/fast5-tutorial/sample_fast5/workspace/FAK42335_2bf4f211a2e2d04662e50f27448cfd99dafbd7ee_400.fast5 0067fb48-9e65-415a-966a-fbf25c62e730 /epi2melabs/fast5-tutorial/sample_fast5/workspace/FAK42335_2bf4f211a2e2d04662e50f27448cfd99dafbd7ee_400.fast5 0091aa27-0f2f-4e79-bb6e-6bfa1629326b /epi2melabs/fast5-tutorial/sample_fast5/workspace/FAK42335_2bf4f211a2e2d04662e50f27448cfd99dafbd7ee_400.fast5 00a52e30-a584-4ed8-97cf-074c601b0403
The program fast5_subset within ont_fast5_api can be used to create a new file set containing only a subset of reads.
The sample data contains data from a microbial mock community. Using the accompanying BAM alignment file lets find the reads with align to a single reference sequence:
!rm -rf read_list.txt
!echo "read_id" > read_list.txt
!samtools view fast5_sample.bam lfermentum \
| awk '{print $1}' \
| tee -a read_list.txt \
| echo "Found" $(wc -l) "reads"
Found 1100 reads
We can now use this file with the subsetting program:
!echo $input_folder
!rm -rf $output_folder/lfermentum
!run fast5_subset --input $input_folder --save_path $output_folder/lfermentum \
--read_id_list read_list.txt --batch_size 8000 --recursive
/epi2melabs/fast5_tutorial/sample_fast5 | 1105 of 1105|##############################################|100% Time: 0:00:02 INFO:Fast5Filter:1100 reads extracted
Analyses groups¶It can be the case that it is desirable to remove the Analyses groups from multi-read files. For example if live basecalling were performed during a run but these results are not wanted before data is archived.
To accomplish this task we will use the compress_fast5 program with the --sanitize option:
!rm -rf $output_folder/sanitized
!run compress_fast5 --input_path $input_folder --save_path $output_folder/sanitize \
--compression vbz --recursive --threads 8 --sanitize
| 5 of 5|####################################################|100% Time: 0:00:12
This achieves an approximate 3.5X reduction in filesize:
!du -sh $input_folder $output_folder/sanitize
2.4G /epi2melabs/fast5_tutorial/sample_fast5 682M analysis/sanitize
In this notebook we have introduced the Variant Call Format with an examplar file from the Medaka consensus and variant calling program. We have outlined the contents of such files and how they can be intepreted with a selection of common software packages.
The code tools presented here can be run on any dataset from an Oxford Nanopore Technologies' device. The code will run within the EPI2ME Labs notebook server environment.