API documentation¶
This part of the documentation is automatically generated from the PubChemPy source code and comments.
Search functions¶
-
pubchempy.
get_compounds
(identifier, namespace=u'cid', searchtype=None, as_dataframe=False, **kwargs)¶ Retrieve the specified compound records from PubChem.
Parameters: - identifier – The compound identifier to use as a search query.
- namespace – (optional) The identifier type, one of cid, name, smiles, sdf, inchi, inchikey or formula.
- searchtype – (optional) The advanced search type, one of substructure, superstructure or similarity.
- as_dataframe – (optional) Automatically extract the
Compound
properties into a pandasDataFrame
and return that.
-
pubchempy.
get_substances
(identifier, namespace=u'sid', as_dataframe=False, **kwargs)¶ Retrieve the specified substance records from PubChem.
Parameters:
-
pubchempy.
get_assays
(identifier, namespace=u'aid', **kwargs)¶ Retrieve the specified assay records from PubChem.
Parameters: - identifier – The assay identifier to use as a search query.
- namespace – (optional) The identifier type.
-
pubchempy.
get_properties
(properties, identifier, namespace=u'cid', searchtype=None, as_dataframe=False, **kwargs)¶ Retrieve the specified properties from PubChem.
Parameters: - identifier – The compound, substance or assay identifier to use as a search query.
- namespace – (optional) The identifier type.
- searchtype – (optional) The advanced search type, one of substructure, superstructure or similarity.
- as_dataframe – (optional) Automatically extract the properties into a pandas
DataFrame
.
Compound¶
-
class
pubchempy.
Compound
(record)¶ Corresponds to a single record from the PubChem Compound database.
The PubChem Compound database is constructed from the Substance database using a standardization and deduplication process. Each Compound is uniquely identified by a CID.
Initialize with a record dict from the PubChem PUG REST service.
For most users, the
from_cid()
class method is probably a better way of creating Compounds.Parameters: record (dict) – A compound record returned by the PubChem PUG REST service. -
record
¶ The raw compound record returned by the PubChem PUG REST service.
-
classmethod
from_cid
(cid, **kwargs)¶ Retrieve the Compound record for the specified CID.
Usage:
c = Compound.from_cid(6819)
Parameters: cid (int) – The PubChem Compound Identifier (CID).
-
to_dict
(properties=None)¶ Return a dictionary containing Compound data. Optionally specify a list of the desired properties.
synonyms, aids and sids are not included unless explicitly specified using the properties parameter. This is because they each require an extra request.
-
to_series
(properties=None)¶ Return a pandas
Series
containing Compound data. Optionally specify a list of the desired properties.synonyms, aids and sids are not included unless explicitly specified using the properties parameter. This is because they each require an extra request.
-
cid
¶ The PubChem Compound Identifier (CID).
Note
When searching using a SMILES or InChI query that is not present in the PubChem Compound database, an automatically generated record may be returned that contains properties that have been calculated on the fly. These records will not have a CID property.
-
elements
¶ List of element symbols for atoms in this Compound.
-
synonyms
¶ A ranked list of all the names associated with this Compound.
Requires an extra request. Result is cached.
-
sids
¶ Requires an extra request. Result is cached.
-
aids
¶ Requires an extra request. Result is cached.
-
charge
¶ Formal charge on this Compound.
-
molecular_formula
¶ Molecular formula.
-
molecular_weight
¶ Molecular Weight.
-
canonical_smiles
¶ Canonical SMILES, with no stereochemistry information.
-
isomeric_smiles
¶ Isomeric SMILES.
-
inchi
¶ InChI string.
-
inchikey
¶ InChIKey.
-
iupac_name
¶ Preferred IUPAC name.
-
xlogp
¶ XLogP.
-
exact_mass
¶ Exact mass.
-
monoisotopic_mass
¶ Monoisotopic mass.
-
tpsa
¶ Topological Polar Surface Area.
-
complexity
¶ Complexity.
-
h_bond_donor_count
¶ Hydrogen bond donor count.
-
h_bond_acceptor_count
¶ Hydrogen bond acceptor count.
-
rotatable_bond_count
¶ Rotatable bond count.
-
fingerprint
¶ Raw padded and hex-encoded fingerprint, as returned by the PUG REST API.
-
cactvs_fingerprint
¶ PubChem CACTVS fingerprint.
Each bit in the fingerprint represents the presence or absence of one of 881 chemical substructures.
More information at ftp://ftp.ncbi.nlm.nih.gov/pubchem/specifications/pubchem_fingerprints.txt
-
heavy_atom_count
¶ Heavy atom count.
-
isotope_atom_count
¶ Isotope atom count.
-
atom_stereo_count
¶ Atom stereocenter count.
-
defined_atom_stereo_count
¶ Defined atom stereocenter count.
-
undefined_atom_stereo_count
¶ Undefined atom stereocenter count.
-
bond_stereo_count
¶ Bond stereocenter count.
-
defined_bond_stereo_count
¶ Defined bond stereocenter count.
-
undefined_bond_stereo_count
¶ Undefined bond stereocenter count.
-
covalent_unit_count
¶ Covalently-bonded unit count.
-
Atom¶
-
class
pubchempy.
Atom
(aid, number, x=None, y=None, z=None, charge=0)¶ Class to represent an atom in a
Compound
.Initialize with an atom ID, atomic number, coordinates and optional change.
Parameters: -
aid
= None¶ The atom ID within the owning Compound.
-
number
= None¶ The atomic number for this atom.
-
x
= None¶ The x coordinate for this atom.
-
y
= None¶ The y coordinate for this atom.
-
z
= None¶ The z coordinate for this atom. Will be
None
in 2D Compound records.
-
charge
= None¶ The formal charge on this atom.
-
element
¶ The element symbol for this atom.
-
to_dict
()¶ Return a dictionary containing Atom data.
-
set_coordinates
(x, y, z=None)¶ Set all coordinate dimensions at once.
-
coordinate_type
¶ Whether this atom has 2D or 3D coordinates.
-
Bond¶
-
class
pubchempy.
Bond
(aid1, aid2, order=1, style=None)¶ Class to represent a bond between two atoms in a
Compound
.Initialize with begin and end atom IDs, bond order and bond style.
Parameters: -
aid1
= None¶ ID of the begin atom of this bond.
-
aid2
= None¶ ID of the end atom of this bond.
-
order
= None¶ Bond order.
-
style
= None¶ Bond style annotation.
-
to_dict
()¶ Return a dictionary containing Bond data.
-
Substance¶
-
class
pubchempy.
Substance
(record)¶ Corresponds to a single record from the PubChem Substance database.
The PubChem Substance database contains all chemical records deposited in PubChem in their most raw form, before any significant processing is applied. As a result, it contains duplicates, mixtures, and some records that don’t make chemical sense. This means that Substance records contain fewer calculated properties, however they do have additional information about the original source that deposited the record.
The PubChem Compound database is constructed from the Substance database using a standardization and deduplication process. Hence each Compound may be derived from a number of different Substances.
-
classmethod
from_sid
(sid)¶ Retrieve the Substance record for the specified SID.
Parameters: sid (int) – The PubChem Substance Identifier (SID).
-
record
= None¶ A dictionary containing the full Substance record that all other properties are obtained from.
-
to_dict
(properties=None)¶ Return a dictionary containing Substance data.
If the properties parameter is not specified, everything except cids and aids is included. This is because the aids and cids properties each require an extra request to retrieve.
Parameters: properties – (optional) A list of the desired properties.
-
to_series
(properties=None)¶ Return a pandas
Series
containing Substance data.If the properties parameter is not specified, everything except cids and aids is included. This is because the aids and cids properties each require an extra request to retrieve.
Parameters: properties – (optional) A list of the desired properties.
-
sid
¶ The PubChem Substance Idenfitier (SID).
-
synonyms
¶ A ranked list of all the names associated with this Substance.
-
source_name
¶ The name of the PubChem depositor that was the source of this Substance.
-
source_id
¶ Unique ID for this Substance within those from the same PubChem depositor source.
-
standardized_cid
¶ The CID of the Compound that was produced when this Substance was standardized.
May not exist if this Substance was not standardizable.
-
standardized_compound
¶ Return the
Compound
that was produced when this Substance was standardized.Requires an extra request. Result is cached.
-
deposited_compound
¶ Return a
Compound
produced from the unstandardized Substance record as deposited.The resulting
Compound
will not have acid
and will be missing most properties.
-
cids
¶ A list of all CIDs for Compounds that were produced when this Substance was standardized.
Requires an extra request. Result is cached.
-
aids
¶ A list of all AIDs for Assays associated with this Substance.
Requires an extra request. Result is cached.
-
classmethod
Assay¶
-
class
pubchempy.
Assay
(record)¶ -
classmethod
from_aid
(aid)¶ Retrieve the Assay record for the specified AID.
Parameters: aid (int) – The PubChem Assay Identifier (AID).
-
record
= None¶ A dictionary containing the full Assay record that all other properties are obtained from.
-
to_dict
(properties=None)¶ Return a dictionary containing Assay data.
If the properties parameter is not specified, everything is included.
Parameters: properties – (optional) A list of the desired properties.
-
aid
¶ The PubChem Substance Idenfitier (SID).
-
name
¶ The short assay name, used for display purposes.
-
description
¶ Description
-
project_category
¶ A category to distinguish projects funded through MLSCN, MLPCN or from literature.
Possible values include mlscn, mlpcn, mlscn-ap, mlpcn-ap, literature-extracted, literature-author, literature-publisher, rnaigi.
-
comments
¶ Comments and additional information.
-
results
¶ A list of dictionaries containing details of the results from this Assay.
-
target
¶ A list of dictionaries containing details of the Assay targets.
-
revision
¶ Revision identifier for textual description.
-
aid_version
¶ Incremented when the original depositor updates the record.
-
classmethod
pandas functions¶
Each of the search functions, get_compounds()
, get_substances()
and
get_properties()
has an as_dataframe
parameter. When set to True
, these functions automatically
extract properties from each result in the list into a pandas DataFrame
and return that instead of
the results themselves.
If you already have a list of Compounds or Substances, the functions below allow a DataFrame
to be
constructed easily.
Exceptions¶
-
exception
pubchempy.
PubChemPyError
¶ Base class for all PubChemPy exceptions.
-
exception
pubchempy.
ResponseParseError
¶ PubChem response is uninterpretable.
-
exception
pubchempy.
PubChemHTTPError
¶ Generic error class to handle all HTTP error codes.
-
exception
pubchempy.
BadRequestError
¶ Request is improperly formed (syntax error in the URL, POST body, etc.).
-
exception
pubchempy.
NotFoundError
¶ The input record was not found (e.g. invalid CID).
-
exception
pubchempy.
MethodNotAllowedError
¶ Request not allowed (such as invalid MIME type in the HTTP Accept header).
-
exception
pubchempy.
TimeoutError
¶ The request timed out, from server overload or too broad a request.
See Avoiding TimeoutError for more information.
-
exception
pubchempy.
UnimplementedError
¶ The requested operation has not (yet) been implemented by the server.
-
exception
pubchempy.
ServerError
¶ Some problem on the server side (such as a database server down, etc.).
Changes¶
- As of v1.0.3, the
atoms
andbonds
properties onCompounds
now return lists ofAtom
andBond
objects, rather than dicts. - As of v1.0.2, search functions now return an empty list instead of raising a
NotFoundError
exception when no results are found.NotFoundError
is still raised when attempting to create aCompound
using thefrom_cid
class method with an invalid CID.