Summary: There is a wealth of genomic and proteomic data for different Plasmodium species but tools to map sequence-aligned data onto a 3-dimensional protein structures remain limited. We present a Python tool (BioStructMap) that allows mapping of a diverse range of genomic or proteomic data over known or modelled protein structures. This tool also allows the incorporation of data from nearby residues into the mapped outputs. This Python tool is easily extensible, allowing the user to define custom functions to apply to spatially aggregated data. A practical application of this tool is to map underlying genomic sequences to protein structures, allowing the user to perform genetic tests over spatially linked codons rather than the traditional approaches which only map to linear sequences. This new approach is especially useful to identify where selection pressures arise at the level of protein structure. To do this, we used the BioStructMap tool to perform a modified calculation of Tajima’s D which combines protein structural information with a sliding window method. Using this approach, we identified a unique region of PfAMA1 involving both domains II and III that was observed to be under a high degree of balancing selection relative to the rest of the protein. This region was not identified using traditional linear sequence-based approaches.
Availability and implementation: The Python BioStructMap package is available at https://github.com/andrewguy/biostructmap and released under the MIT License. An online server implementing standard functionality is available at https://biostructmap.burnet.edu.au.