Advanced¶
Avoiding TimeoutError¶
If there are too many results for a request, you will receive a TimeoutError. There are different ways to avoid this, depending on what type of request you are doing.
If retrieving full compound or substance records, instead request a list of cids or sids for your input, and then request the full records for those identifiers individually or in small groups. For example:
sids = get_sids('Aspirin', 'name')
for sid in sids:
s = Substance.from_sid(sid)
When using the formula
namespace or a searchtype
, you can also alternatively use the listkey_count
and
listkey_start
keyword arguments to specify pagination. The listkey_count
value specifies the number of
results per page, and the listkey_start
value specifies which page to return. For example:
get_compounds('CC', 'smiles', searchtype='substructure', listkey_count=5)
get('C10H21N', 'formula', listkey_count=3, listkey_start=6)
Logging¶
PubChemPy can generate logging statements if required. Just set the desired logging level:
import logging
logging.basicConfig(level=logging.DEBUG)
The logger is named ‘pubchempy’. There is more information on logging in the Python logging documentation.
Using behind a proxy¶
When using PubChemPy behind a proxy, you may receive a URLError
:
URLError: <urlopen error [Errno 65] No route to host>
A simple fix is to specify the proxy information via urllib. For Python 3:
import urllib
proxy_support = urllib.request.ProxyHandler({
'http': 'http://<proxy.address>:<port>',
'https': 'https://<proxy.address>:<port>'
})
opener = urllib.request.build_opener(proxy_support)
urllib.request.install_opener(opener)
For Python 2:
import urllib2
proxy_support = urllib2.ProxyHandler({
'http': 'http://<proxy.address>:<port>',
'https': 'https://<proxy.address>:<port>'
})
opener = urllib2.build_opener(proxy_support)
urllib2.install_opener(opener)
Custom requests¶
If you wish to perform more complicated requests, you can use the request
function. This is an extremely simple
wrapper around the REST API that allows you to construct any sort of request from a few parameters. The
PUG REST Specification has all the information you will need to formulate your requests.
The request
function simply returns the exact response from the PubChem server as a string. This can be parsed in
different ways depending on the output format you choose. See the Python json, xml and csv packages for more
information. Additionally, cheminformatics toolkits such as Open Babel and RDKit offer tools for handling SDF
files in Python.
The get
function is very similar to the request
function, except it handles listkey
type responses
automatically for you. This makes things simpler, however it means you can’t take advantage of using the same
listkey
repeatedly to obtain different types of information. See the PUG REST specification for more information
on how listkey responses work.
Summary of possible inputs¶
<identifier> = list of cid, sid, aid, source, inchikey, listkey; string of name, smiles, xref, inchi, sdf;
<domain> = substance | compound | assay
compound domain
<namespace> = cid | name | smiles | inchi | sdf | inchikey | <structure search> | <xref> | listkey | formula
<operation> = record | property/[comma-separated list of property tags] | synonyms | sids | cids | aids | assaysummary | classification
substance domain
<namespace> = sid | sourceid/<source name> | sourceall/<source name> | name | <xref> | listkey
<operation> = record | synonyms | sids | cids | aids | assaysummary | classification
assay domain
<namespace> = aid | listkey | type/<assay type> | sourceall/<source name>
<assay type> = all | confirmatory | doseresponse | onhold | panel | rnai | screening | summary
<operation> = record | aids | sids | cids | description | targets/{ProteinGI, ProteinName, GeneID, GeneSymbol} | doseresponse/sid
<structure search> = {substructure | superstructure | similarity | identity}/{smiles | inchi | sdf | cid}
<xref> = xref/{RegistryID | RN | PubMedID | MMDBID | ProteinGI | NucleotideGI | TaxonomyID | MIMID | GeneID | ProbeID | PatentID}
<output> = XML | ASNT | ASNB | JSON | JSONP [ ?callback=<callback name> ] | SDF | CSV | PNG | TXT