pdb_extract

 
Go to Top

Table of Contents

Contact us for help

Please send email to deposit@deposit.rcsb.org for any problem regarding pdb_extract. Please indicate your session ID that appears at the top of the page when you start pdb_extract run.

Please do not send us email for any problem you encounted during PDB OneDep deposition and validation unless you believe the problem comes from the file pdb_extract prepared. For OneDep deposition issue unrelated to pdb_extract, please send message at the communication page within the OneDep session, or else your question will not be answered.

What can pdb_extract do?

  • Convert model coordinates file from PDB format to mmCIF format
  • Create re-usable common metadata template file from scratch
  • Extract common metadata template from an existing mmCIF file
  • Incorporate the common metadata into multiple model coordinates files for efficient deposition preparation
  • Allow users to review the polymer sequences in the model coordinates and provide the complete sample sequences
  • For X-Ray structures, extract diffraction data statatistics from the log files of certain data process software packages
  • For 3DEM structures, output a map-only metadata file that can facilitate map-only depositions of related maps
  • Check the final output file and report its readiness for OneDep deposition

Instruction on using mmCIF editor to review and edit a common metadata template?

What can mmCIF editor do and how to access?

The mmCIF editor hosted by PDBj can load any existing mmCIF file for review and update, and then the final file can be saved to your local machine. pdb_extract users are encouraged to use the editor to edit method specific metadata template: X-Ray, NMR, EM, and save the template, then re-use it for multiple structures that share the same set of common metadata.

You can also load your locally saved common metadata template by going to the mmCIF editor home page, click the gear icon at the upper left, and then choose "Open mmCIF File".

General instruction on mmCIF editor

General instruction can be found at the home page of the mmCIF editor. The brief instruction includes how to load and save mmCIF files, how to add or remove mmCIF data categories, how to add or remove mmCIF data items within a categories. Please read the short instruction before you start editing.

What metadata need to be provided for OneDep deposition?

All data items with * or ! mark are the data items whose values are required for a OneDep deposition. Filling them in the template will save your time of typing their values at OneDep deposition interface for multiple structures. If a data items, such as entry_id, is pre-filled with "TO BE ASSIGNED, DO NOT CHANGE", please do not update them. Other ID data items are usually filled with number 1/2/3... sequentially, and if you are to add a new row, simply fill the next ordinal number, with the exception of citation.id that should be "primary".

Save the re-usable template to your local machine

After editing, click the upper left gear icon and click "Save mmCIF" to save the file to your local machine, as indicated in the general instruction

pdb_extract use cases

General cases for all methods

Simple convertion from PDB to mmCIF format

At the pdb_extract homepage, choose the experimental method, choose PDB file format, then upload the file and press Run button. pdb_extract will start the conversion and bring you to the 2nd page that displays the uploaded file information at the top. Confirm the file contents, then do nothing at the 2nd page, scroll down to the bottom of the page, and press Run button again. pdb_extract will finish the conversion and bring you to the final page where your mmCIF model coordinates file download link is at the top of the page.

Use a metadata template for multiple related structures

If you have created a method specific metadata template for X-Ray, NMR, EM, and save the template to your local machine, you can re-use it for related structures. Start pdb_extract on each structure separately, at the 2nd page, below the summary of the uploaded file, you can upload the common metadata file, and proceed to continue the pdb_extract run, which will incorprate the contents from the metadata template into your model coordinates file and allow you to download the combined file at the final page.

Incorporate a metadata template into an existing mmCIF format model coordinates file

Just like PDB format file, upload your mmCIF format model coordinates file at the pdb_extract home page, and then upload the common metadata template at the 2nd page, and proceed to finish the pdb_extract run.

Provide complete sample sequences

There are two ways to provide the complete sample sequences. You can either provide them in the Common metadata template under entity_poly category, or provide the sequences in the webform for each entity at the bottom of the 2nd page of a pdb_extract run. If you provide sequences in both places, the sequences in the webform will be taken as the final sequences.

Easy template -- split common metadata from an existing mmCIF file

If you have a mmCIF file of related structure with common metadata in it, either your previous released public mmCIF file, or from a previously submitted OneDep deposition, you can upload this file into the pdb_extract, do nothing at the 2nd page, and then at the final page, there will be an additional download link of metadata-only mmCIF. This split metadata file can be used as a template for you to add metadata into your next pdb_extract run on a related structure.

Cases specific to X-Ray

Extract diffraction data statistics

For an X-Ray structure, after you upload a model coordinates to start pdb_extract, at the 2nd page you can specify the reflection data processing software and upload the log files for pdb_extract to parse certain statistics into the final file.

Extract from a common metadata template and log files together

For an X-Ray structure, at the 2nd page, you can upload both a common metadata file and reflection data processing log files. Metadata from all files will be extracted and incorporated into the final file. You can also provide sequence information in the webform of the page if the sequences are not provided in the common metadata file.

Cases specific to 3DEM

Use pdb_extract for map-only map deposition

If you have a set of 3DEM structures, some with model coordinates, and some without, you will need to make two types of OneDep depositions: PDB deposition for model coordintes, and map-only deposition. pdb_extract can help you to deposit those map-only entries if you run pdb_extract on the structures with model coordinates first. If metadata were provided either in the uploaded model coordinates file (such as the mmCIF file you received after it was submitted and processed by PDB annotators), or through a 3DEM common metadata template, the final page will display another file download of map-only-metadata mmCIF file. Download this file and upload it for your map-only OneDep deposition and some of the OneDep pages will be automatically filled with the metadata in it.

Process structure of a composite map

For a composite map, process your model coordinates with pdb_extract first. If metadata were provided either in the uploaded model coordinates file, or through a 3DEM common metadata template, the final page will display another file download of map-only-metadata mmCIF file. Download this file and upload it for your other map-only OneDep deposition of maps related to the composite structure.

Cases specific to NMR

Convert PDB format file without a chain ID

It is not uncommon that a PDB structure of an NMR structure does not have chain ID. Upload the file without chain ID to pdb_extract, and pdb_extract try to add chain ID. Review the sequence information at the 2nd page and confirm the chain ID assignment is proper.

Frequently asked questions

How to cite pdb_extract?

Reference: Huanwang Yang, Vladimir Guranovic, Shuchismita Dutta, Zukang Feng, Helen M. Berman and John D. Westbrook (2004), Acta Cryst. D60, 1833-18399

How to process or convert structure factor file?

Please use SF-tool. Please note that mtz file can be directly deposited at OneDep.

What is a common metadata template and how to create one?

A set of related structures usually share a subset of metadata such as contact author, citation, biological entity information, and experimental instrument information, which is collectively called common metadata. These metadata can be stored in a template file that you can re-use it for multiple structures. You can create method specific metadata template: X-Ray, NMR, EM, and save the template to your local machine.

What is the easiest way to create a common metadata template?

Check here.

What to do with the OneDep deposition readiness report?

The checks are performed to review whther the output model coordinates file has complete mandatory metadata for a OneDep PDB deposition, and whether the metadata comply with the PDB mmCIF dictionary. Incomplete or improper metadata are reported under each checking, which may be corrected by re-running pdb_extract with updated information in the model coordinates file, the template, the log files, or the web form. Or you can choose to not to do anything and make corrections at the OneDep deposition user interface.

How to provide polymer sequences and their source information?

You can provide polymer sequences either in the common metadata file, or the webform at the 2nd page of a pdb_extract run, or choose to provide the sequence and source information directly at the OneDep deposition user interface.

What is the difference between modeled sequence and sample sequence?

Modeled sequence includes the residues with atomic coordinates. Sample sequence is the complete sequence of all residues used in the experiment including expression tags,linkers, mutations, and unobserved residues due to disorder, i.e. a sample sequence include both the modeled sequence as well as unmodeled sequence in the gaps or termini.

How to record non-standard residues in the sequence?

While a sample sequence is recorded using standard one-letter codes, non-standard residues should be input using the three-letter code in parenthesis, e.g. (MSE)

How is mmCIF format processed by pdb_extract?

If the input model coordinates file is already in mmCIF format, pdb_extract takes it as is, runs a mmCIF format check, and incorporates common metadata, polymer sequences if provided in the webform, and statistics parsed from X-Ray reflection data processing log files, if applicable.

How to handle unicode error in PDB input file?

Usually a unicode error means you have non-ASCII text in the model coordinates file header, which is usually in the 1st line form refinement software path information. Please try to remove the 1st line and then re-upload the file again. If the problem persists, you can either run a ASCII code check yourself, or contact us.

How to add PDB chain ID?

You can add chain ID with certain refinement softwares, by manual editing, or try to run pdb_extract as is. pdb_extract will try to add chain ID. Please review and confirm the chain ID at the 2nd page of a pdb_extract run.

What software packages does pdb_extract support for log parsing?

Currently pdb_extract supports Aimless, HKL-2000, HKL-3000, XSCALE, XDS, SCALA, SCALEPACK, xia2, DIALS, pointless, d*TREK, CrystFEL, cctbx.xfel

Can I use pdb_extract to help map-only depostion?

Yes, see here.

For 3DEM output, what is the difference between metadata-only file and map-only-metadata file?

The map-only-metadata file only contains the metadata for maps.

Why wasn't the metadata I input in the template incorporated?

There are a few possibilities. If you edited the template beyond the mmCIF editor, the template file may not comply with mmCIF format. Try re-upload the file into the mmCIF editor by clicking the upper left gear icon and choose "Open mmCIF File". If you can still read all data fields, then your template format is OK, or else please review your edit.

If a same data field has different data between the uploaded model coordinates and the template file, the value in the model coordinates file will take higher prority, and the value in the template will be ignored.

© wwPDB