The PBxplore API allows to write all the files the command line tools can. This includes the outputs of PBassign. The functions to handle several file formats are available in the :mod:pbxplore.io
module.
from __future__ import print_function, division
from pprint import pprint
%cd ../../../
import pbxplore as pbx
The most common way to save PB sequences is to write them in a fasta file.
PBxplore allows two ways to write fasta files. The sequences can be written either all at once or one at a time. To write a batch of sequences at once, we need a list of sequences and a list of the corresponding sequence names. The writing function here is :func:pbxplore.io.write_fasta
.
names = []
pb_sequences = []
for chain_name, chain in pbx.chains_from_files(['demo1_assignation/2LFU.pdb']):
dihedrals = chain.get_phi_psi_angles()
pb_seq = pbx.assign(dihedrals)
names.append(chain_name)
pb_sequences.append(pb_seq)
pprint(names)
pprint(pb_sequences)
with open('output.fasta', 'w') as outfile:
pbx.io.write_fasta(outfile, pb_sequences, names)
!cat output.fasta
!rm output.fasta
Sequences can be written once at a time using the :func:pbxplore.io.fasta._write_fasta_entry
function.
with open('output.fasta', 'w') as outfile:
for chain_name, chain in pbx.chains_from_files(['demo1_assignation/2LFU.pdb']):
dihedrals = chain.get_phi_psi_angles()
pb_seq = pbx.assign(dihedrals)
pbx.io.fasta._write_fasta_entry(outfile, pb_seq, chain_name)
!cat output.fasta
!rm output.fasta
By default, the lines in fasta files are wrapped at 60 caracters as defined in :const:pbxplore.io.fasta.FASTA_WIDTH
. Both :func:pbxplore.io.write_fasta
and :func:pbxplore.io.fasta._write_fasta_entry
have a width
optionnal argument that allows to control the wrapping.
print(pb_sequences[0])
with open('output.fasta', 'w') as outfile:
for width in (60, 70, 80):
pbx.io.fasta._write_fasta_entry(outfile, pb_sequences[0],
'width={} blocks'.format(width),
width=width)
!cat output.fasta
!rm output.fasta
If we do not want to wrap the PB sequences, we can write a flat file. Flat files contain raw sequences, one per line, with no header. Such file can be easily written using the print
built-in. Yet, PBxplore offers a quick way to write flat records of a list of sequences: the :func:pbxplore.io.write_flat
function.
with open('output.flat', 'w') as outfile:
pbx.io.write_flat(outfile, pb_sequences)
!cat output.flat
!rm output.flat
One needs the phi and psi dihedral angles to assign protein block sequences. Having these angles, it is sometime convenient to store them in a file. This can be done using the :func:pbxplore.io.write_phi_psi
function. This function works like the :func:pbxplore.io.write_fasta
one as it takes 3 arguments: the output file, a list of dihedral angle records, and a list of names corresponding to each record.
# Store the dihedral angles for all chains in a PDB file.
# Store also the chain name for all chains.
all_dihedrals = []
names = []
for chain_name, chain in pbx.chains_from_files(['demo1_assignation/2LFU.pdb']):
all_dihedrals.append(chain.get_phi_psi_angles())
names.append(chain_name)
# Write the dihedral angles in a file
with open('output.phipsi', 'w') as outfile:
pbx.io.write_phipsi(outfile, all_dihedrals, names)
The output is formated with one line per residue. The first columns repeat the name given for the chain, then is the residue id followed by the phi and the psi angle. If an angle is not defined, 'None' is written instead.
!head output.phipsi
!tail output.phipsi
!rm output.phipsi
We want to read sequences that we wrote in files. PBxplore provides a function to read fasta files: the :func:pbxplore.io.read_fasta
function.
def pdb_to_fasta_pb(pdb_path, fasta_path):
"""
Write a fasta file with all the PB sequences from a PDB
"""
with open(fasta_path, 'w') as outfile:
for chain_name, chain in pbx.chains_from_files([pdb_path]):
dihedrals = chain.get_phi_psi_angles()
pb_seq = pbx.assign(dihedrals)
pbx.io.fasta._write_fasta_entry(outfile, pb_seq, chain_name)
# Write a fasta file
pdb_to_fasta_pb('demo1_assignation/2LFU.pdb', 'output.fasta')
# Read a list of headers and a list of sequences from a fasta file
names, sequences = pbx.io.read_fasta('output.fasta')
print('names:')
pprint(names)
print('sequences:')
pprint(sequences)
!rm output.fasta
If the sequences we want to read are spread amongst several fasta files, then we can use the :func:pbxplore.io.read_several_fasta
function that takes a list of fasta file path as argument instead of a single path.
# Write several fasta files
pdb_to_fasta_pb('demo1_assignation/1BTA.pdb', '1BTA.fasta')
pdb_to_fasta_pb('demo1_assignation/2LFU.pdb', '2FLU.fasta')
# Read the fasta files
names, sequences = pbx.io.read_several_fasta(['1BTA.fasta', '2FLU.fasta'])
# Print the first entries
print('names:')
pprint(names[:5])
print('sequences:')
pprint(sequences[:5])
!rm 1BTA.fasta 2FLU.fasta