pdb_extract

 
Go to Top

Table of Contents

Contact us for help

Please send email to deposit@deposit.rcsb.org for any problem regarding pdb_extract. Please indicate your session ID that appears at the top of the page when you start pdb_extract run.

Please do not send us email for any problem you encounted during PDB OneDep deposition and validation unless you believe the problem comes from the file pdb_extract prepared. For OneDep deposition issue unrelated to pdb_extract, please send message at the communication page within the OneDep session, or else your question will not be answered.

What can pdb_extract do?

  • Convert model coordinates file from PDB format to mmCIF format
  • Create re-usable common metadata file file from scratch
  • Extract common metadata file from an existing mmCIF file
  • Incorporate the common metadata into multiple model coordinates files for efficient deposition preparation
  • Allow users to review the polymer sequences in the model coordinates and provide the complete sample sequences
  • For X-Ray structures, extract diffraction data statatistics from the log files of certain data process software packages
  • For 3DEM structures, output a map-only metadata file that can facilitate map-only depositions of related maps
  • Check the final output file and report its readiness for OneDep deposition

Instruction on using the mmCIF editor to prepare a common metadata file

How to access the mmCIF editor and what can it do?

The mmCIF editor is integrated directly into pdb_extract as Step 1 (Optional): Prepare Metadata File. From the home page, click Option A: Prepare Metadata File to open the editor page. The editor is powered by the PDBj mmCIF editor and supports the following ways to load a metadata file:

  • Create from a method-specific file — click the X-Ray, NMR, or EM button to load a pre-configured template for that experimental method.
  • Load a previously prepared metadata file — click Local Metadata File to open a file from your local disk.
  • Extract from an mmCIF file — select a local mmCIF file (plain or gzip-compressed .gz) and click Extract From mmCIF File to import metadata categories from it.
  • Fetch by PDB or EMDB ID — enter a PDB ID (e.g. 2HYV) or EMDB ID (e.g. EMD-1234) and click Fetch Metadata to retrieve metadata from the PDBj database.

When importing or fetching metadata, a popup appears that lets you select which mmCIF categories to import and whether to merge them into the method-specific template or load the entry data as-is.

General instruction on mmCIF editor buttons

Once a file is loaded, the editor displays all mmCIF categories and their data items for review and editing. General editing instructions (adding/removing categories and items, editing values) can be found by clicking the Help button in the editor toolbar, which opens the PDBj CIF editor wiki.

The following action buttons appear at the top of the editor once a file is loaded:

  • Save and Finish — saves the current metadata file internally in your browser (IndexedDB) and redirects you to Step 2 to upload your structure file. The saved file will be automatically offered for use at Step 3.
  • Save to disk — downloads the current metadata file to your local machine for long-term storage and re-use. Use this option to build a re-usable file library, or for 3DEM map-only depositions where the metadata file is submitted directly to OneDep.
  • Clear contents — resets the editor to the originally loaded template, discarding all unsaved edits.
  • Go back — returns to the editor start page, discarding all unsaved edits.
  • Help — opens the PDBj CIF editor wiki in a new tab.
What metadata need to be provided for OneDep deposition?

All data items with * or ! mark are the data items whose values are required for a OneDep deposition. Filling them in the metadata file will save your time of typing their values at OneDep deposition interface for multiple structures. If a data items, such as entry_id, is pre-filled with "TO BE ASSIGNED, DO NOT CHANGE", please do not update them. Other ID data items are usually filled with number 1/2/3... sequentially, and if you are to add a new row, simply fill the next ordinal number, with the exception of citation.id that should be "primary".

Save the re-usable metadata file to your local machine

After editing, click the Save to disk button in the editor toolbar to download the file to your local machine. To re-use the metadata file in a future pdb_extract session, go to Step 1, click Local Metadata File to load it back into the editor, make any updates, then click Save and Finish to proceed. Alternatively, at Step 3 you can choose Upload another metadata mmCIF file and upload the saved metadata file directly.

pdb_extract use cases

General cases for all methods

Simple convertion from PDB to mmCIF format

At the pdb_extract home page, click Option B: Upload Structure Model Coordinates File. Select the experimental method, select PDB file format, upload the file, and press Run. pdb_extract will convert the file and bring you to Step 3, which displays a summary of the uploaded structure at the top. Verify the summary, then scroll to the bottom and press Run again without filling in any additional fields. pdb_extract will finish the conversion and bring you to Step 4, where your mmCIF model coordinates file download link is at the top of the page.

Use a metadata file for multiple related structures

If you have previously prepared a method-specific metadata file (X-Ray, NMR, or EM) and saved it to your local machine, you can re-use it for related structures. Start pdb_extract on each structure by clicking Option B on the home page and uploading your structure file. At Step 3, under Incorporate Information from a Common Metadata File, choose Upload another metadata mmCIF file and upload your saved metadata file. Proceed to press Run, and pdb_extract will incorporate the metadata from the metadata file into your model coordinates file. The combined file is available for download at Step 4.

Alternatively, you can start with Option A on the home page to open the integrated editor, load your metadata file, review and update it as needed, then click Save and Finish. At Step 3, choose Use metadata file prepared in Step 1 to use the version you just reviewed.

Incorporate a metadata file into an existing mmCIF format model coordinates file

Just like a PDB format file, upload your mmCIF format model coordinates file at Step 2 (via Option B on the home page), and then provide the common metadata file at Step 3, and proceed to finish the pdb_extract run.

Provide complete sample sequences

There are two ways to provide the complete sample sequences. You can either provide them in the common metadata file under the entity_poly category (via the Step 1 editor), or provide the sequences in the webform for each entity at the bottom of Step 3. If you provide sequences in both places, the sequences entered in the webform will be taken as the final sequences.

Easy metadata -- extract common metadata from an existing mmCIF file

If you have an mmCIF file of a related structure with common metadata in it — either a previously released public mmCIF file or one from a previously submitted OneDep deposition — you can extract metadata from it to create a metadata file. Choose Option A from the home page, then choose Extract from an mmCIF file option, and click Extract From mmCIF file button. At he next page, you can review the metadata and save it to disk.

Cases specific to X-Ray

Extract diffraction data statistics

For an X-Ray structure, after you upload a model coordinates file at Step 2, at Step 3 you can specify the reflection data processing software and upload the log files for each processing step (indexing, scaling, molecular replacement, phasing, refinement). pdb_extract will parse statistics from these log files into the final output file.

Extract from a common metadata file and log files together

For an X-Ray structure, at Step 3, you can provide both a common metadata file and reflection data processing log files. Metadata from all sources will be extracted and incorporated into the final output file. You can also provide sequence information in the webform at the bottom of Step 3 if the sequences are not provided in the common metadata file.

Cases specific to 3DEM

Use pdb_extract for map-only map deposition

If you have a set of 3DEM structures, some with model coordinates and some without, you will need to make two types of OneDep depositions: a PDB deposition for model coordinates, and a map-only deposition. pdb_extract can help you deposit those map-only entries. Run pdb_extract on the structures with model coordinates. If metadata were provided either in the uploaded model coordinates file (such as an mmCIF file received after PDB annotation), or through a 3DEM common metadata file, Step 4 will display an additional Download Map-Only Metadata File button. Download this file and upload it for your map-only OneDep deposition; some OneDep pages will be automatically filled with the metadata it contains.

Process structure of a composite map

For a composite map, process your model coordinates with pdb_extract first. If metadata were provided either in the uploaded model coordinates file or through a 3DEM common metadata file, Step 4 will display an additional Download Map-Only Metadata File button. Download this file and upload it for the map-only OneDep deposition of maps related to the composite structure.

Cases specific to NMR

Convert PDB format file without a chain ID

It is not uncommon for a PDB format NMR structure file to lack a chain ID. Upload the file at Step 2 and pdb_extract will attempt to add a chain ID. Review the sequence information at Step 3 and confirm that the chain ID assignment is correct before pressing Run.

Frequently asked questions

How to cite pdb_extract?

Reference: Huanwang Yang, Vladimir Guranovic, Shuchismita Dutta, Zukang Feng, Helen M. Berman and John D. Westbrook (2004), Acta Cryst. D60, 1833-18399

How to process or convert structure factor file?

Please use SF-tool. Please note that mtz file can be directly deposited at OneDep.

What is a common metadata file and how to create one?

A set of related structures usually share a subset of metadata such as contact author, citation, biological entity information, and experimental instrument information, which is collectively called common metadata. These metadata can be stored in a file that you can re-use for multiple structures. To create a metadata file, go to the home page and click Option A: Prepare Metadata File to open the integrated mmCIF editor. Click the X-Ray, NMR, or EM button to load the appropriate method-specific starting template, fill in your common metadata, and click Save to disk to save the metadata file to your local machine for future re-use.

What is the easiest way to create a common metadata file?

Check here.

What to do with the OneDep deposition readiness report?

The checks are performed to review whether the output model coordinates file has complete mandatory metadata for a OneDep PDB deposition, and whether the metadata comply with the PDB mmCIF dictionary. Incomplete or improper metadata are reported under each check, which may be corrected by re-running pdb_extract with updated information in the model coordinates file, the metadata file, the log files, or the web form. Or you can choose to not do anything and make corrections at the OneDep deposition user interface.

How to provide polymer sequences and their source information?

You can provide polymer sequences either in the common metadata file (via the Step 1 editor), or in the sequence webform at the bottom of Step 3, or choose to provide the sequence and source information directly at the OneDep deposition user interface.

What is the difference between modeled sequence and sample sequence?

Modeled sequence includes the residues with atomic coordinates. Sample sequence is the complete sequence of all residues used in the experiment including expression tags, linkers, mutations, and unobserved residues due to disorder, i.e. a sample sequence includes both the modeled sequence as well as unmodeled sequence in the gaps or termini.

How to record non-standard residues in the sequence?

While a sample sequence is recorded using standard one-letter codes, non-standard residues should be input using the three-letter code in parenthesis, e.g. (MSE)

How is mmCIF format processed by pdb_extract?

If the input model coordinates file is already in mmCIF format, pdb_extract takes it as is, runs a mmCIF format check, and incorporates common metadata, polymer sequences if provided in the webform, and statistics parsed from X-Ray reflection data processing log files, if applicable.

How to handle unicode error in PDB input file?

Usually a unicode error means you have non-ASCII text in the model coordinates file header, which is usually in the 1st line from refinement software path information. Please try to remove the 1st line and then re-upload the file again. If the problem persists, you can either run an ASCII code check yourself, or contact us.

How to add PDB chain ID?

You can add chain ID with certain refinement softwares, by manual editing, or try to run pdb_extract as is. pdb_extract will attempt to add a chain ID. Please review and confirm the chain ID assignment at Step 3 before pressing Run.

What software packages does pdb_extract support for log parsing?

Currently pdb_extract supports Aimless, HKL-2000, HKL-3000, XSCALE, XDS, SCALA, SCALEPACK, xia2, DIALS, pointless, d*TREK, CrystFEL, cctbx.xfel

Can I use pdb_extract to help map-only depostion?

Yes, see here.

For 3DEM output, what is the difference between metadata-only file and map-only-metadata file?

The metadata file (available for all methods) contains all common metadata split out from the structure output, and is intended for re-use as a metadata file for related structures in pdb_extract. The map-only-metadata file (3DEM only) contains only the EM map-related metadata subset and is intended for direct upload to OneDep for map-only depositions.

Why wasn't the metadata I input in the metadata file incorporated?

There are a few possibilities. If you edited the metadata file outside of the mmCIF editor, the metadata file may not comply with mmCIF format. Try loading the file back into the Step 1 editor via the Local Metadata File button. If you can still read all data fields, the metadata file format is OK; otherwise please review your edits.

If the same data field has different values in the uploaded model coordinates file and in the metadata file, the value in the model coordinates file takes higher priority, and the value in the metadata file will be ignored.

© wwPDB