Automatic Drift Correction C CINA setup

General Description

This wiki page describes the on-the-fly drift-correction running on a computer next to the console of the Titan/K2-Summit in our basement.

Pipeline architecture

The general motion-correction setup looks as following:

Images are acquired on the K2 summit and then stored on one of the SSD-raid systems in the K2 computer. The drift-correction runs on a dedicated Linux workstation termed 'drift-corrector'. The on-the-fly processing will be setup such that adding a new raw stack to a particular folder on the SSD-raids triggers the motion-correction. Once a stack is corrected the drift-correction GUI stores the aligned average and the aligned stack on the network attached storage server called cina-qnap01. Storing the aligned stacks is optional. Raw data is either manually copied or deleted after each session on the microscope. Note that storing raw data is only a good idea if you plan to do method development where raw stacks are required. In most cases storing the aligned stacks is sufficient.

Proposed Workflow

In general the idea is that you store the raw stacks as .dm4 in a folder in the very fast local SSD-RAID drives called X or Y-drive of the K2-computer. The automation running on the adjacent Linux box will monitor the directory on the K2-computer. If a new stack has arrived in that directory, the data will be copied via a fast connection (an optical cable) to the drift-corrector. All output folders from the drift correction process, i.e. aligned average and aligned stack, should be on the stand-alone hard-drive system called cina-qnap01. Therewith you don't have to move data manually. The idea is that you optimize the drift-correction on the fly during your session and thus storing raw data (huge stacks) is not required in general. Note due to the limited storage space of the extremely fast X and Y drive you have to periodically delete raw stacks there. Make sure that you delete all data on these buffer drives after your session as the next user needs this space.

Some points to keep in mind before starting:

You have to make sure that the drift-correction worked fine during session (see below). Revisiting motion-correction later during the processing is painful and slow.
Only keep good images. It's not worth to keep bad data.
Storing of raw data should only be done in case your project involves "movie-mode" algorithmic development, e.g. movie-mode unbending for 2D crystals.
Only store the aligned stacks if your processing features "movie-mode" processing or you want to later work on averages featuring a lower cumulated dose, otherwise disable the storing aligned stacks.
Avoid using filenames with spaces or other strange symbols in the filename.
Always remove your raw stacks from the K2 computer's X and Y drives!
If somebody did not delete his/her stuff on the X and Y drives, feel free to do so!

Detailed description of the automated workflow

Once the processing of a newly added stack is triggered, the raw stack is copied to the drift-corrector over a 10GbE connection to an exceptional fast caching-disk (SSD in PCIE). The raw stack on the SSD of the K2 computer is not touched at all.
The drift-corrector does some data format conversion steps to ensure that the later alignment program understands the input data.
Super-resolution only: Input images are binned (done by Fourier-cropping).
GPU-accelerated drift correction of the stack by means Yifan Cheng's drift-correction program.
Uploading of the aligned average and the aligned stack to cina-qnap01.
Generation of diagnostic charts used for quality evaluation.

Proposed imaging conditions

The imaging conditions strongly depend on the sample. Super-resolution mode should be used for all studies that target publishing a high-resolution map in the end. For screening or similar tasks the regular 4k counted mode is good enough.

For high-resolution studies it is worth to use a rather high electron dose, i.e. 30-50 e-/A^2. After you processed these averages we provide a script that extracts averages featuring a lower cumulated dose, i.e. only contain the first N frames. Make sure that you store the aligned stack if you plan to calculate lower-dose maps later.

The count/pixel/sec should always be below 4-6 for the physical 5-micron pixels of the K2. Note that the value reported by DM4 when operating the K2 in 8k mode has to be multiplied by four to get the counts on the physical pixels.

Don't forget to toggle between 8k and 4k when changing from Exposure to View and back. Otherwise you will end up with a super-resolution search image.

Acknowledgments, Papers to cite

Following papers should be cited in your work produced with this setup:

Our paper describing the automation setup running in our basement:

Scherer, S., et al. 2dx_automator: Implementation of a semiautomatic high-throughput high-resolution cryo-electron crystallography pipeline. J. Struct. Biol. (2014)

The paper from Yifan Cheng's lab describing the motion-correction method around which we built the automation:

Li, X., et al., 2013. Electron counting and beam-induced motion correction enable near-atomic-resolution single-particle cryo-EM. Nat. Methods 10, 584–590.

Automation Setup

Proposed project structure

Raw data should temporarily be stored on the fast X or Y drives on the K2 computer. Thus you should create a folder in one of these raid systems. Don't forget to removed this folder after your session. Additionally we suggest to have one folder per session on cina-qnap01. We propose that you have the following subfolders in your session folder on the QNAP:

aligned_ave storing the corrected sum of frames.
aligned_stack (optional) Dedicated folder where the automation keeps the aligned stacks. This option is important in case you want to produce lower dose averages later during the processing.
raw_data (very optional, not recommended) In cases where keeping raw data, i.e. unaligned stacks, makes sense you should move these large files to this location after you session and/or after the caching drive is full (see below). In general you should not keep raw data.

The K2 computer is not the place to store data!

Starting the automation

Create all folders, i.e. raw-data cache, session folder including subfolders on the QNAP.
Open a terminal and type MC_automator
Select the input directory for the automation. Note that you have to navigate into the directory and then click on OK. Only selecting the directory from the top level directory does not work. You find the X caching-drive on the K2 computer under /run/user/1001/gvfs/ftp:host=192.168.201.1/, respectively the Y-drive under /run/user/1001/gvfs/ftp:host=192.168.201.1.port=8021/.
Select the output directory storing the aligned averages. Usually this directory is in your session folder on the QNAP, which is mounted under /mnt/cina-qnap01.
Select directory in the QNAP storing the aligned averages.
Modify the processing parameters as described below.
Select input type.
Launch the automation process and confirm that you really want to launch it.

Drift-correction GUI

1. Input (storing the raw stacks) and output folder (storing the averages)

Text block showing the input and output directory selected while opening the GUI. Please, double-check the folders before launching the process.

2. Automation control buttons

Buttons used to start and stop the periodic checking of the input folder.

3. Processing parameter section

Starting frame number: Due to 'shutter opening' artefact you should remove the two first frames. Optimal setting / default setting for this value is 2.
Ending frame number: Similar to "Starting frame number" but defines the last frame used for alignment and averaging. Zero means that all frames are used for the alignment. We propose to set this option to zero as we can later produce lower-dose averages from your aligned stacks.
Offset: Internal algorithmic parameter of the motion-correction. Optimally this is one forth of the total number of frames used for the alignment. If you record movies with 40 frames set it to 10-12. Note that a too high number (larger than (N/2)-1 ) will crash the automation.
BFactor: Internal algorithmic parameter for the motion-correction. For 4k images set it to 220 and for super-resolution images the default value of 150 is good.
Input Type: Defines the input type of the stacks. This option has to be in register with the K2 settings. Note that in order to change this value you stop the automation, change the setting and relaunch the process.

4. Export and caching location configuration

Export location: You can select a folder to where the aligned average is copied once you click on export. This feature is particular useful for screening 2D crystals. In general you don't need this feature.
Store stack option: Here you select whether and where the aligned stacks are stored. Uncheck the checkbox if you only need the aligned averages.
Caching location: As the speed of the entire automation is limited by the disk speed we use advanced storage technology in the drift-corrector. By default the automation selects this fast card automatically on our setup. In case you mess this option up, the proper location is /mnt/cache.

5. Troubleshoot button

In some rare cases it can be that the automation blocks. Clicking on this button helps in most cases. Before using this option make sure that the automation really stuck and that you are not just waiting for an extremely large image. If troubleshooting did not fix the automation; stopping and relaunching the process (button 2) might help. Worst case you have to reopen the GUI.

6. Image overview

List of all images in your current project. The diagnostic images on the right are always shown for the selected image

7. Reprocess image button

In case you want to reprocess an image with changed parameters click on this button. Note that you have to reset the processing parameters after reprocessing, as only one set of parameters can be active for the entire project.

If the raw stack is deleted/moved away the button is disabled and the stack can not be reprocessed.

8. Drift profile

The line shows the drift applied to all frames of the stack. The first frame always is set to (0,0).

9. Operate image buttons

Open image: Will open the aligned average with e2display. Note that the automation GUI is blocked while an instance of e2display is running.
Delete image: Bad images should be deleted via this button, which removes all data related to this stack, i.e. raw stack, aligned stack, aligned average. There is no option to restore a deleted image.

10. Raw Fourier transform

Fourier transform of the raw averaged image. Usually the Thon rings are not visible in all directions

11. Corrected Fourier transform

Corrected Fourier transform of the aligned average. You should be able to Thon rings out to 3-4A.

12. Motion-corrected average (massively binned)

Preview of the drift-corrected average. Note that due to binning this image has artificially more contrast then the real image.

13. Rotational averaged Power-spectrum

Comparison of the rotationally averaged power specta (green line from initial data, blue line from corrected data).

Tipps and Tricks

Analysing the Drift-profile

It is important that you optimise the drift-correction during your session. If a drift-profile looks physically plausible then we believe the found corrections.

If the drift-profile curve shows large (>15 pixels) movements between the first frames you should rerun the automation for this particular image with a higher starting frame number.

The left image below shows an example for a reliably drift-corrected image, whereas for the right image there is no hope to get the correction fixed. Such an image should be deleted.

Data Storage (optional)

For the rare case you need to store raw stacks we installed a copy utility program called RichCopy on the K2 computer. Follow this protocol to copy raw stacks:

Open RichCopy on the K2 compter (Subfigure A).
The GUI showed in Subfigure B will pop up.
Select the input directory for the copy process by clicking on Source (Subfigure C). This directory is usually the on the X or Y-drive and is the particular directory where you stored the raw stacks.
Select the output directory on the QNAP by clicking on Destination (Subfigure D).
Launch the process by clicking on the green play button.

Note that the raw averages will be moved to the destination folder, thus reprocessing these stacks in no longer possible later. The copy process will take a long time as you are moving a lot of data over the regular house network.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly