X Tutup
Skip to content

Commit be1143c

Browse files
authored
Fixing #11
1 parent 04af0de commit be1143c

File tree

1 file changed

+21
-93
lines changed

1 file changed

+21
-93
lines changed

structure/bioassembly.md

Lines changed: 21 additions & 93 deletions
Original file line numberDiff line numberDiff line change
@@ -99,19 +99,17 @@ Here another example, the bacteriophave GA protein capsid PDB ID [1GAV](http://w
9999

100100
Since biological assemblies can be accessed via the StructureIO interface, in principle there is no need to access the lower-level code in BioJava that allows to re-create biological assemblies. If you are interested in looking at the gory details of this, here a couple of pointers into the code. In principle there are two ways for how to get to a biological assembly:
101101

102-
A) The biological assembly needs to be re-built and the atom coordinates of the asymmetric unit need to be rotated according to the instructions in the files. The information required to re-create the biological assemblies is available in both the PDB an mmCIF/PDBx files.
102+
1. The biological assembly needs to be re-built and the atom coordinates of the asymmetric unit need to be rotated according to the instructions in the files. The information required to re-create the biological assemblies is available in both the PDB an mmCIF/PDBx files. In PDB files the relevant transformations are stored in the *REMARK 350* records. For mmCIF/PDBx, the *_pdbx_struct_assembly* and *_pdbx_struct_oper_list* categories store the corresponding rules.
103103

104-
In PDB files the relevant transformations are stored in the *REMARK 350* records. For mmCIF/PDBx, the *_pdbx_struct_assembly* and *_pdbx_struct_oper_list* categories store the corresponding rules.
104+
2. There is also a pre-computed file available from the PDB that contains an assembled version of a structure. This file can be parsed directly, without having to perform rotation operations on coordinates.
105105

106-
B) There is also a pre-computed file available that contains an assembled version of a structure. This file can be parsed directly, without having to perform rotation operations on coordinates.
106+
As of version 5.0 BioJava contains utility classes to re-create biological assemblies for both PDB and mmCIF files.
107107

108-
BioJava contains utility classes to re-create biological assemblies for both PDB and mmCIF, as well as to parse the pre-computed file. The [BioUnitDataProvider](http://www.biojava.org/docs/api/org/biojava/nbio/structure/quaternary/io/BioUnitDataProvider.html) interface defines what is required to re-build an assembly. The [BioUnitDataProviderFactory](http://www.biojava.org/docs/api/org/biojava/nbio/structure/quaternary/io/BioUnitDataProviderFactory.html) allows to specify which of the BioUnitDataProviders is getting used.
109-
110-
Take a look at the method getBiologicalAssembly() in [StructureIO](http://www.biojava.org/docs/api/org/biojava/nbio/structure/StructureIO.html) to see how the BioUnitDataProviders are used by the *BiologicalAssemblyBuilder*.
108+
Take a look at the method `getBiologicalAssembly()` in [StructureIO](http://www.biojava.org/docs/api/org/biojava/nbio/structure/StructureIO.html) to see how the underlying *BiologicalAssemblyBuilder* is called.
111109

112110
## Memory consumption
113111

114-
This example in the next section loads the structure of the PBCV-1 virus capsid (PDB ID [1M4X](http://www.rcsb.org/pdb/explore.do?structureId=1m4x)). It consists of 16 million atoms and has one of the largest, if not the largest biological assembly that is currently available in the PDB. Needless to say it is important to change the maximum heap size parameter, otherwise there is no successfully load this. It requires a minimum of 9GB RAM to load (measured on Java 1.7 on OSX). You can change the heap size by providing the following startup parameter (and assuming you have 10G or more of RAM available on your system)
112+
This example in the next section loads the structure of the PBCV-1 virus capsid (PDB ID [1M4X](http://www.rcsb.org/pdb/explore.do?structureId=1m4x)). It consists of 16 million atoms and has one of the largest, if not the largest biological assembly that is currently available in the PDB. Needless to say it is important to change the maximum heap size parameter, otherwise you will not be able to load it. It requires a minimum of 9GB RAM to load (measured on Java 1.7 on OSX). You can change the heap size by providing the following startup parameter (and assuming you have 10G or more of RAM available on your system)
115113
<pre>
116114
-Xmx10G
117115
</pre>
@@ -131,97 +129,27 @@ Note: when loading this structure with 9GB of memory, the Java VM spends a signi
131129
</tr>
132130
</table>
133131

134-
## Low level access to parsing pre-assembled biological asssembly files
135-
136-
To load the pre-assembled biological assembly file directly, one can tweak the low-level PDB file parser like this
137-
138-
```java
139-
140-
public static void main(String[] args){
141-
142-
public static void main(String[] args){
143-
144-
// This loads the PBCV-1 virus capsid, one of, if not the biggest biological assembly in terms on nr. of atoms.
145-
// The 1m4x.pdb1.gz file has 313 MB (compressed)
146-
// This Structure requires a minimum of 9 GB of memory to be loaded in memory.
147-
148-
String pdbId = "1M4X";
149-
150-
Structure bigStructure = readStructure(pdbId,1);
151-
152-
// let's take a look how much memory this consumes currently
132+
## Representing symmetry related chains
133+
Chains are identified by chain identifiers which serve to distinguish the different molecular entities present in the asymmetric unit. Once a biological assembly is built it can be composed of chains from both the asymmetric unit or from chains resulting in applying a symmetry operator (this chains are also called "symmetry mates"). The problem with that is that the symmetry mates will get the same chain identifiers as the untransformed chains.
153134

154-
Runtime r = Runtime.getRuntime();
135+
In order to solve that issue there are 2 solutions:
155136

156-
// let's try to trigger the Java Garbage collector
157-
r.gc();
137+
1. Assign new chain identifiers. In BioJava the new chain identifiers assigned are of the form `<original chain id>_<symmetry operator id>`.
138+
2. Place the symmetry partners into different models. This is the solution taken by the pre-computed biounit files available from the PDB.
158139

159-
System.out.println("Memory consumption after " + pdbId +
160-
" structure has been loaded into memory:");
161-
162-
String mem = String.format("Total %dMB, Used %dMB, Free %dMB, Max %dMB",
163-
r.totalMemory() / 1048576,
164-
(r.totalMemory() - r.freeMemory()) / 1048576,
165-
r.freeMemory() / 1048576,
166-
r.maxMemory() / 1048576);
140+
Since version 5.0 BioJava uses approach 1) to store the biounit in a single `Structure` object. Because the chain identifiers are then of more than 1 character, the Structure can only be written out in mmCIF format (PDB format is limited to 1 character chain identifiers).
167141

168-
System.out.println(mem);
169-
170-
System.out.println("# atoms: " + StructureTools.getNrAtoms(bigStructure));
171-
172-
}
173-
/** Load a specific biological assembly for a PDB entry
174-
*
175-
* @param pdbId .. the PDB ID
176-
* @param bioAssemblyId .. the first assembly has the bioAssemblyId 1
177-
* @return a Structure object or null if something went wrong.
178-
*/
179-
public static Structure readStructure(String pdbId, int bioAssemblyId) {
180-
181-
// pre-computed files use lower case PDB IDs
182-
pdbId = pdbId.toLowerCase();
183-
184-
// we need to tweak the FileParsing parameters a bit
185-
FileParsingParameters p = new FileParsingParameters();
186-
187-
// some bio assemblies are large, we want an all atom representation and avoid
188-
// switching to a Calpha-only representation for large molecules
189-
// note, this requires several GB of memory for some of the largest assemblies, such a 1MX4
190-
p.setAtomCaThreshold(Integer.MAX_VALUE);
191-
192-
// parse remark 350
193-
p.setParseBioAssembly(true);
194-
195-
// The low level PDB file parser
196-
PDBFileReader pdbreader = new PDBFileReader();
197-
198-
// we just need this to track where to store PDB files
199-
// this checks the PDB_DIR property (and uses a tmp location if not set)
200-
AtomCache cache = new AtomCache();
201-
pdbreader.setPath(cache.getPath());
202-
203-
pdbreader.setFileParsingParameters(p);
204-
205-
// download missing files
206-
pdbreader.setAutoFetch(true);
207-
208-
pdbreader.setBioAssemblyId(bioAssemblyId);
209-
pdbreader.setBioAssemblyFallback(false);
210-
211-
Structure structure = null;
212-
try {
213-
structure = pdbreader.getStructureById(pdbId);
214-
if ( bioAssemblyId > 0 )
215-
structure.setBiologicalAssembly(true);
216-
structure.setPDBCode(pdbId);
217-
} catch (Exception e){
218-
e.printStackTrace();
219-
return null;
220-
}
221-
return structure;
222-
}
223-
```
142+
In BioJava one can still produce a biounit using approach 2) by passing a boolean parameter to the `getBiologicalAssembly` method:
143+
```java
144+
Structure struct = StructureIO.getBiologicalAssembly(pdbId, true);
145+
```
146+
## PDB entries with more than 1 biological assemblies
147+
Many PDB entries are assigned more than 1 biological assemblies. This is due to many factors: sometimes the authors disagree with the annotators, sometimes the authors are not sure about which biological assembly is the right one, sometimes there are several equivalent biological assemblies present in the asymmetric unit (but with slightly different conformations) and each of those is annotated as a different biological assembly.
224148

149+
To get all biological assemblies for a given PDB entry one needs to use:
150+
```java
151+
List<Structure> bioAssemblies = StructureIO.getBiologicalAssemblies(pdbId);
152+
```
225153

226154
## Further Reading
227155

0 commit comments

Comments
 (0)
X Tutup