International Chemical Identifier
| InChI | |
|---|---|
| Developer | InChI Trust |
| Initial release | April 15, 2005[1][2] |
| Stable release | 1.07.5
/ February 17, 2026 |
| Operating system | Windows and Unix-like |
| Platform | IA-32 and x86-64 |
| Available in | English |
| License | MIT License (since v1.07); LGPL (until v1.04); IUPAC-InChI Trust License (v1.05, v1.06) |
| Website | www |
| Repository | |
The International Chemical Identifier (InChI, pronounced /ˈɪntʃiː/ IN-chee)[3] is a textual identifier for chemical substances, designed to provide a standard way to encode molecular information and to facilitate the search for such information in databases and on the web. Initially developed by the International Union of Pure and Applied Chemistry (IUPAC) and National Institute of Standards and Technology (NIST) from 2000 to 2005, the format and algorithms are non-proprietary. Since May 2009, it has been developed by the InChI Trust, a nonprofit charity from the United Kingdom which works to implement and promote the use of InChI.[4]
The identifiers describe chemical substances in terms of layers of information — the atoms and their bond connectivity, tautomeric information, isotope information, stereochemistry, and electronic charge information.[5] Not all layers have to be provided; for instance, the tautomer layer can be omitted if that type of information is not relevant to the particular application. The InChI algorithm converts input structural information into a unique InChI identifier in a three-step process: normalization (to remove redundant information), canonicalization (to generate a unique number label for each atom), and serialization (to give a string of characters).
InChIs differ from the widely used CAS registry numbers in three respects: firstly, they are freely usable and non-proprietary; secondly, they can be computed from structural information and do not have to be assigned by some organization; and thirdly, most of the information in an InChI is human readable (with practice). InChIs can thus be seen as akin to a general and extremely formalized version of IUPAC names. They can express more information than the simpler SMILES notation and, in contrast to SMILES strings, every structure has a unique InChI string, which is important in database applications. Information about the 3-dimensional coordinates of atoms is not represented in InChI; for this purpose a format such as PDB can be used.
The InChIKey, sometimes referred to as a hashed InChI, is a fixed length (27 character) condensed digital representation of the InChI that is not human-understandable. The InChIKey specification was released in September 2007 in order to facilitate web searches for chemical compounds, since these were problematic with the full-length InChI.[6] Unlike the InChI, the InChIKey is not unique: though collisions are expected to be extremely rare, there are known collisions.[7]
InChI was first released in 2005. A major milestone was version 1.02 of January 2009, which provided a means to generate so called standard InChI, a version of the InCHI with a fixed level of detail and collection of layers. The standard InChIKey is then the hashed version of the standard InChI string. The standard InChI will simplify comparison of InChI strings and keys generated by different groups, and subsequently accessed via diverse sources such as databases and web resources. Since version 1.07.1 (August 2024), the software uses the MIT license, and may be downloaded from the InChI GitHub site. Beside the implementation in molecule editors, stand-alone executables have been packaged for multiple Linux distributions,[8] including Debian.[9]
Generation
[edit]In order to avoid generating different InChIs for tautomeric structures, before generating the InChI, an input chemical structure is normalized to reduce it to its so-called core parent structure. This may involve changing bond orders, rearranging formal charges and possibly adding and removing protons. Different input structures may give the same result; for example, acetic acid and acetate would both give the same core parent structure, that of acetic acid. A core parent structure may be disconnected, consisting of more than one component, in which case the sublayers in the InChI usually consist of sublayers for each component, separated by semicolons (periods for the chemical formula sublayer). One way this can happen is that all metal atoms are disconnected during normalization; so, for example, the InChI for tetraethyllead will have five components, one for lead and four for the ethyl groups.[5]
The first, main, layer of the InChI refers to this core parent structure, giving its chemical formula, non-hydrogen connectivity without bond order (/c sublayer) and
hydrogen connectivity (/h sublayer.) The /q portion of the charge layer gives its charge, and the /p portion of the charge layer tells how many protons (hydrogen ions) must be added to or removed from it to regenerate the original structure. If present, the stereochemical layer, with sublayers b, /t, /m and /s, gives stereochemical information, and
the isotopic layer /i (which may contain sublayers /h, /b, /t, /m and /s) gives isotopic information. These are the only layers which can occur in a standard InChI.[5]
If the user wants to specify an exact tautomer, a fixed hydrogen layer /f can be appended, which may contain various additional sublayers; this cannot be done in standard InChI though, so different tautomers will have the same standard InChI (for example, alanine will give the same standard InChI whether input in a neutral or a zwitterionic form.)
Finally, a nonstandard reconnected /r layer can be added, which effectively gives a new InChI generated without breaking bonds to metal atoms. This may contain various sublayers, including /f.[5]
Format and layers
[edit]| InChI format | |
|---|---|
| Internet media type |
chemical/x-inchi |
| Type of format | chemical file format |
Every InChI starts with the string InChI= followed by the version number, currently 1. If the InChI is standard, this is followed by the letter S for standard InChIs, which is a fully standardized InChI flavor maintaining the same level of attention to structure details and the same conventions for drawing perception. The remaining information is structured as a sequence of layers and sub-layers, with each layer providing one specific type of information. The layers and sub-layers are separated by the delimiter / and start with a characteristic prefix letter (except for the chemical formula sub-layer of the main layer). The six layers with important sublayers are:[10]
- Main layer (always present)
- Chemical formula (no prefix). This is the only sublayer that must occur in every InChI. Numbers used throughout the InChI are given in the formula's element order excluding hydrogen atoms. For example, /C10H16N5O13P3 implies that atoms numbered 1–10 are carbons, 11–15 are nitrogens, 16–28 are oxygens, and 29–31 are phosphorus.
- Atom connections (
/c). The atoms in the chemical formula (except for hydrogens) are numbered in sequence; this sublayer describes which atoms are connected by bonds to which other ones. The type of those bonds is later specified in the stereochemical layer (/b). - Hydrogen atoms (
/h). Describes how many hydrogen atoms are connected to each of the other atoms.
- Charge layer
- charge sublayer (
/q) - proton sublayer (
/pfor protons)
- charge sublayer (
- Stereochemical layer
- double bonds and cumulenes (
/b). - tetrahedral stereochemistry of atoms and allenes. First
/tdescribes the relative configuration, which implies a preference for one of the mirror forms. Then/mis used to choose whether to mirror the molecule described by/t, if an absolute configuration is requested. - type of stereochemistry information (
/s)./s1for absolute,/s2for relative (unspecified mix of chiralities),/s3for racemic (equal mix of both chiralities).
- double bonds and cumulenes (
- Isotopic layer (
/i), may include sublayers:[10]- sublayer
/hfor isotopic hydrogen - sublayers
/b,/t,/m,/sfor isotopic stereochemistry
- sublayer
- Fixed-H layer (
/f) for tautomeric hydrogens; contains some or all of the above types of layers except atom connections; may end withosublayer. - Reconnected layer (
/r); contains the whole InChI of a structure with reconnected metal atoms
The delimiter-prefix format has the advantage that a user can easily use a wildcard search to find identifiers that match only in certain layers.
Standard InChI adds the following constraints:[10]
- The
/f,/o, and/r(sub)layers are never included in standard InChI. - If stereochemistry is specified, it can only be absolute
/s1. Unknown stereo designations are treated as undefined. - Organometallic connectivity does not include bonds to the metal.
InChIKey
[edit]The condensed, 27 character InChIKey is a hashed version of the full InChI (using the SHA-256 algorithm), designed to allow for easy web searches of chemical compounds.[6] The standard InChIKey is the hashed counterpart of standard InChI. Most chemical structures on the Web up to 2007 have been represented as GIF files, which are not searchable for chemical content. The full InChI turned out to be too lengthy for easy searching, and therefore the InChIKey was developed. There is a very small, but nonzero chance of two different molecules having the same InChIKey, but the probability for duplication of only the first 14 characters has been estimated as only one duplication in 75 databases each containing one billion unique structures. With all databases currently having below 50 million structures, such duplication appears unlikely at present. A recent study more extensively studies the collision rate finding that the experimental collision rate is in agreement with the theoretical expectations.[11]
The InChIKey currently consists of three parts separated by hyphens, of 14, 10 and one character(s), respectively, like xxxxxxxxxxxxxx-yyyyyyyyfv-p.[12][5]
- The first 14 characters (x) result from a SHA-256 hash of the connectivity information (the main layer and
/qsublayer of the charge layer) of the InChI. The mapping to letters is a "base-26" encoding. - The second part consists of 8 characters (y) resulting from a hash of the remaining ("minor") layers of the InChI, a single character (f) indicating the kind of InChIKey (
Sfor standard andNfor nonstandard), and a character (v) indicating the version of InChI used (currentlyAfor version 1). - Finally, the single character (p) at the end indicates the protonation of the core parent structure, corresponding to the
/psublayer of the charge layer (Nfor no protonation,O,P, ... if protons should be added andM,L, ... if they should be removed.)
The following are examples of InCHIs and InChIKeys. Because all standard InCHIs can be trivially turned "nonstandard" by removing the "S" marker, they actually imply two keys that differ by one character.
| Structural formula | Name | InChI | InChIKey | Note |
|---|---|---|---|---|
| Ethanol | InChI=1S/C2H6O/c1-2-3/h3H,2H2,1H3 |
|
Standard | |
| (-)-borneol | InChI=1S/C10H18O/c1-9(2)7-4-5-10(9,3)8(11)6-7 |
|
Standard. Tetrahedral marks on atoms 7, 8, and 10. | |
| (+)-borneol | InChI=1S/C10H18O/c1-9(2)7-4-5-10(9,3)8(11)6-7 |
|
Standard. Note the use of /m1 to request the enantiomer.
| |
| (±)-borneol | InChI=1S/C10H18O/c1-9(2)7-4-5-10(9,3)8(11)6-7 |
DTGKSKDOIYIVQL-IUNFSSIHNA-N | Relative, unspecified. | |
| InChI=1S/C10H18O/c1-9(2)7-4-5-10(9,3)8(11)6-7 |
DTGKSKDOIYIVQL-SCAUNJPWNA-N | Racemic. | ||
| Morphine | InChI=1S/C17H19NO3 |
|
Standard. | |
| H[22H]O | Semiheavy water | InCHI=1S/H2O/h1H2/i/hD |
|
Isotopic information is part of the standard. |
| [22H]2O | Heavy water | InChI=1S/H2O/h1H2/i/hD2 |
|
D2 for two deuteriums.
|
| [32H]2O | Superheavy water | InChI=1S/H2O/h1H2/i/hT2 |
|
T for tritium.
|
| H2[18O] | Heavy-oxygen water | InChI=1S/H2O/h1H2/i1+2 |
|
/i1+2 means the atom number 1 is of an isotope with 2 more atomic mass than the normal one (oxygen-16).
|
Base 26 encoding
[edit]InChIKey uses a base 26 encoding to represent (parts of) SHA-256 hashes. Input is chopped in 14-bit segments, each of which corresponds to three letters (triplets). A remaining group up to 9 bits correspond to 2 characters (doublets). In InChIKey, inputs can only be of two lengths: 65 bits for the "major" hash (divided into 14 × 4 + 9 bits for 3 × 4 + 2 = 14 characters) and 37 bits for the "minor" hash (14 × 2 + 9 bits for 3 × 2 + 2 = 8 characters).[14] A few additional lengths are used in RInChI:[15]
- 28 (14 × 2) bits yield a 6-character hash; only the truncated 4-character form is used.
- 56 (14 × 4) bits yield a 12-character hash, the truncated form being 10 characters.
- 78 (65 + 14 - 1) bits yield a 17-character hash, with one bit used twice.
The first 80 bits of the SHA-256 for an empty string is e3 b0 c4 42 98 fc 1c 14 9a fb. This results in the following base26 strings for this hash: UHFF, UHFFFAOY, UHFFFADPSC, UHFFFADPSCTJ, UHFFFADPSCTJAU, UHFFFADPSCTJAUYIS.[15] These strings are commonly encountered when the corresponding layers have no data. For example, one sees UHFFFAOYSA or UHFFFAOYNA in an InChIKey when the source InChI has no stereochemical information.
from itertools import product
from typing import Iterator, Optional
from warnings import warn
from math import ceil
from hashlib import sha256
AZ = [chr(i) for i in range(ord('A'), ord('Z') + 1)]
AAZZ = [''.join(p) for p in product(AZ, repeat=2)]
# Intentially omitted: EXX, TAA-TTV
AAAZZZ = list(filter(lambda s: not (s.startswith('E') or s >= 'TAA' and s <= 'TTV'), (''.join(p) for p in product(AZ, repeat=3))))
def b26(data: bytes, bitlen: Optional[int]) -> Iterator[str]:
"""
Convert data into InChI segments.
:param data: The data to convert.
:param bitlen: The number of bits to consider from the data. If None, use all bits.
"""
if bitlen is None:
bitlen = len(data) * 8
d = int.from_bytes(data[:ceil(bitlen/8)], "little")
while bitlen > 0:
if bitlen >= 10:
if bitlen < 14:
warn(f"Dumping residual {bitlen}-bit segment as 3char (not proper InChI base26)")
yield AAAZZZ[d & ((1 << 14) - 1)]
d >>= 14
bitlen -= 14
else:
if bitlen < 9:
warn(f"Dumping residual {bitlen}-bit segment as 2char (not proper InChI base26)")
yield AAZZ[d & ((1 << 9) - 1)]
d >>= 9
bitlen -= 9
def b26_14(data: bytes) -> str:
"""Encode 65 bits of data to 14 chars (InChIKey main)."""
return ''.join(b26(data, 65))
def b26_8(data: bytes) -> str:
"""Encode 37 bits of data to 8 chars (InChIKey minor)."""
return ''.join(b26(data, 37))
def b26_r04(data: bytes) -> str:
"""Encode 28 bits of data into 6 chars, then truncate to 4 (RInChIKey metadata/minor)."""
return ''.join(b26(data, 28))[0:4]
def b26_r10(data: bytes) -> str:
"""Encode 56 bits of data into 12 chars, then truncate to 10 (Short-RInChIKey maj)."""
return b26_r12(data)[0:10]
def b26_r12(data: bytes) -> str:
"""Encode bits of data into 12 chars (WebRInChIKey minor)."""
return ''.join(b26(data, 56))
def b26_r17(data: bytes) -> str:
"""Encode 73 bits of data to 17 chars (WebRInChIKey major).
Note: bit 64 is used twice!"""
return ''.join(b26(data, 65)) + ''.join(b26(data[8:], 14))
empty_hash = sha256(b"")
print(f"{empty_hash.hexdigest()=}")
print(f"{b26_8(empty_hash.digest())=}")
def key_14(data: str) -> str:
"""Generate a 14-char InChIKey main from the input string."""
return b26_14(sha256(data.encode()).digest())
# Ethanol: LFQSCWFLJHTTHZ
print(f"{key_14('C2H6O/c1-2-3/h3H,2H2,1H3')=}")
|
InChI resolvers
[edit]As the InChI cannot be reconstructed from the InChIKey, an InChIKey always needs to be linked to the original InChI to get back to the original structure. InChI Resolvers act as a lookup service to make these links, and prototype services are available from National Cancer Institute, the UniChem service at the European Bioinformatics Institute, and PubChem. ChemSpider has had a resolver until July 2015 when it was decommissioned.[16]
AuxInfo
[edit]The auxillary information (AuxInfo) string is produced by InChI software alongside the InChI string. For example, the (±)-borneol /s2 example produces:
AuxInfo=1/0/N:1,2,3,4,5,6,7,8,9,10,11/E:(1,2)/rA:13cCCCCCCCCCCOHH/rB:;;;s4;;s4s6;s6;s1s2s7;n3s5s8s9;P8;P7;s8;/rC:2.0857,-1.1788,0;3.0905,.273,0;2.6864,-1.7772,0;4.5619,-2.283,0;3.6719,-2.2295,0;5.2528,-.9411,0;4.5862,-1.4963,0;4.4381,-.864,0;3.0628,-.7814,0;3.6539,-1.3571,0;3.6343,-.1809,0;5.5343,-1.9585,0;4.8482,.1078,0;
"AuxInfo contains, in particular, atom non-stereo equivalence information, mapping input atom positions to output positions, and 'reversibility' information for re-drawing the structure." The reversibility information can be used to regenerate the source structure (such as a MOLFILE with 2D or 3D coordinates) without needing an InChI.[17] The InChI user guide describes the format in detail. The parts seen here are:
1/0refers to InChI version 1, normalization type 0./N:maps InChI's atom numbering to the input's atom numbering./E:describes the equivalence between atoms./rA:describes reversibility information for atoms./rB:describes reversibility information for bonds./rC:describes reversibility information for coordinates. Here 2D coordinates are used; a more realistic depiction for this molecule would be 3D.
The full complement of tags are: 1/0/N/E/gE/it/iN/I/E/gE/it/iN/CRV/rA/rB/rC.[18]
Derived formats
[edit]RInChI
[edit]RInChI (Reaction InChI, International chemical identifier for reactions) is a standard method for using InChI to describe chemical reactions. An RInChI string consists of several sets of InChI strings for the reactants, products, and agents as well as information required to tag them as such. Example string and breakdown:[19]
| Part | Layer # | Description |
|---|---|---|
| RInChI=1.00.1S/ | 1 | Version of RInChI (1.00), version of InChI used within (1S, verson 1 standard) |
| C2H4O2/c1-2(3)4/h1H3,(H,3,4)!C2H6O/c1-2-3/h3H,2H2,1H3<> | 2 | Left side of reaction (acetic acid and ethanol), version 1 standard InChI without the InChI=1S/ header separated by !
|
| C4H8O2/c1-3-6-4(2)5/h3H2,1-2H3!H2O/h1H2<> | 3 | Right side of reaction (ethyl acetate and water), same format |
| H2O4S/c1-5(2,3)4/h(H2,1,2,3,4)/ | 4 | Agents (sulfuric acid), same format |
| d= | 5 | Direction of reaction (d). d= means equilibrium, d+ means left to right, d- means right to left.
|
As shown above, layers that do not involve InChI parts are separated with / as in InChI. Layers that do are separated with <>. Multiple InChI parts are separated with !.[19]
It is allowed to omit some structures in the RInChI. In this case a sixth layer is used to specify how many structures have been omitted in each of layers 2, 3, and 4.[19]
RInChI has an analogous concept of RInChIKeys'. There are three versions of RInChIKey differing in length:[19]
- The Long-RInChIKey consists of a header and a joining of the full InChIKeys of the chemicals mentioned. The above reaction is
Long-RInChIKey=SA-EUHFF-QTBSBXVTEAMEQO-UHFFFAOYSA-N-LFQSCWFLJTTHZ-UHFFFAOYSA-N--XEKOWRVHYACXOJ-UHFFFAOYSA-N-XLYOFNOQVPJJNP-UHFFFAOYSA-N--QAOWNCQODCNURD-UHFFFAOYSA-N. The "SA" refers to "standard, version 1", the "E" refers to direction (equilibrium, can also be "F" forward, "B" backward, "U" undefined), and the "UHFF" are unused (this is derived from the SHA hash of emptiness). - The Short-RInChIKey is a fixed-length (63 character, 55 without hyphen) string.
Short-RInChIKey=SA-EUHFF-JJFIATRHOH-UDXZTNISGZ-QAOWNCQODC-NUHFF-NUHFF-NUHFF-ZZZ- The header is the same as the Long-RInChIKey.
- The three 10-letter parts are derived from hashing the "major" InChI layers (atom, connectivity; same definition as InChIKey 14-letter part) for layers 2, 3, and 4 respectively; an empty layer hashes to
UHFFFADPSC. - The three 5-letter parts encode the protonation and stereochemistry states for layers 2, 3, 4. The first letter ("N") encodes the total charge like the final one-letter part of InChIKey. The remaining four letters are a hash of the "minor" (stereochemistry, etc; same definition as InChIKey 8-letter part) parts; again, "UHFF"is a hash of the empty value.
- The last three letters encode the number of "no structure" components (layer 6) for layers 2, 3, and 4. "Z" means 0, "A" means 1, etc.
- The Web-RInChIKey is a fixed-length (47 characters, 1 hyphen) string. The above reaction has
Web-RInChIKey=SMUHAWIQPXIVCEVKG-NUHFFFADPSCTJSA. To generate it, the InChI from all the layers of the RInChI are first combined and sorted alphabetically.- The first part (17 letters) is generated by combining the "major" layers of the sorted InChIs.
- The second part consists of an indication of total protonation ("N" here, same encoding as InChIKey), a 12-letter hash of the "minor" layers of the sorted InChIs (
UHFFFADPSCTJis a hash of the empty value), and a version indicator ("SA" means standard, version 1).
MInChI
[edit]MInChI (Mixtures InChI, International chemical identifier for mixtures) is a draft standard for using (partial) InChI to describe a mixture. It actually defines two formats:
- The Mixfile, a JSON-based format for describing mixtures. Chemicals can be identified by name, Molfile, SMILES, InChI, InChIKey, and/or chemical formula.
- The MInChI, a condensed representation of mixtures where chemicals are identified by their InChI.
Both forms allow nesting of mixtures.[20]
An example of a relatively complex (nested) Mixfile is provided below.[21]
{
"mixfileVersion": 1,
"name": "37% wt. Formaldehyde in Water with 10-15% Methanol",
"contents": [
{
"contents": [
{
"name": "formaldehyde",
"quantity": 37,
"units": "w/w%",
"inchi": "InChI=1S/CH2O/c1-2/h1H2",
},
{
"name": "water",
"inchi": "InChI=1S/H2O/h1H2",
}
]
},
{
"name": "methanol",
"quantity": [10, 15],
"units": "%",
"inchi": "InChI=1S/CH4O/c1-2/h2H,1H3",
}
]
}
The corresponding MInChI is: MInChI=0.00.1S/CH2O/c1-2/h1H2&CH4O/c1-2/h2H,1H3&H2O/h1H2/n{{1&3}&2}/g{{37wf-2&}&10:15pp0}.[21]
- The first part
MInChI=0.00.1Sis the version. - The second part
/CH2O/c1-2/h1H2&CH4O/c1-2/h2H,1H3&H2O/h1H2encodes the list of molecules. - The third part
/n{{1&3}&2}encodes the order and nesting relation. - The final part
/g{{37wf-2&}&10:15pp0}encodes the proportions.
It is also possible to create mixfiles with missing chemical formulae and generate MInChI from them; the "third part" of MInChI is intended to adapt to such situations. For more examples, readers can visit the MInChI Demo page. The "Create MInChI" button generates MInChI. Right-clicking on a node and choosing "copy branch" produces its Mixfile representation in the clipboard.[21]
History
[edit]Name
[edit]The format was originally called IChI (IUPAC Chemical Identifier), then renamed in July 2004 to INChI (IUPAC-NIST Chemical Identifier), and renamed again in November 2004 to InChI (IUPAC International Chemical Identifier), a trademark of IUPAC.
Continuing development
[edit]Scientific direction of the InChI standard is carried out by the IUPAC Division VIII Subcommittee, and funding of subgroups investigating and defining the expansion of the standard is carried out by both IUPAC and the InChI Trust. The InChI Trust funds the development, testing and documentation of the InChI. Current extensions are being defined to handle polymers and mixtures, Markush structures, isotopologues and isotopomers,[22] reactions,[23] organometallics, and nanomaterials,[24] and once accepted by the Division VIII Subcommittee will be added to the algorithm.
The continuing development of the standard has been supported since 2010 by the not-for-profit InChI Trust, of which IUPAC is a member. Version 1.06 and was released in December 2020.[25]
Version history
[edit]The InChI Trust has developed software to generate the InChI, InChIKey and other identifiers. The release history of this software follows.[26]
| Software and version | Date | License | Comments |
|---|---|---|---|
| InChI v. 1 | April 2005 | ||
| InChI v. 1.01 | August 2006 | ||
| InChI v. 1.02beta | Sep. 2007 | LGPL 2.1 | Adds InChIKey functionality. |
| InChI v. 1.02 | Jan. 2009 | LGPL 2.1 | Changed format for InChIKey. Introduces standard InChI. |
| InChI v. 1.03 | June 2010 | LGPL 2.1 | |
| InChI v. 1.03 source code docs | March 2011 | LG)L | |
| InChI v. 1.04 | Sep. 2011 | IUPAC/InChI Trust InChI Licence 1.0 | New license. Support for elements 105-112 added. CML support removed. |
| InChI v. 1.05 | Jan. 2017 | IUPAC/InChI Trust InChI Licence 1.0 | Support for elements 113-118 added. Experimental polymer support. Experimental large molecule support. |
| RInChI v. 1.00 | March 2017 | IUPAC/InChI Trust InChI Licence 1.0, and BSD-style | Computes reaction InChIs.[23] |
| InChI v. 1.06 | Dec. 2020 | IUPAC/InChI Trust InChI Licence 1.0[27] | Revised polymer support. |
| InChI v. 1.07.1 | Aug. 2024 | MIT License | Code moved to GitHub |
Adoption
[edit]The InChI has been adopted by many larger and smaller databases, including ChemSpider, ChEMBL, Golm Metabolome Database, and PubChem.[28] However, the adoption is not straightforward, and many databases show a discrepancy between the chemical structures and the InChI they contain, which is a problem for linking databases.[29]
See also
[edit]- Molecular Query Language
- Simplified molecular-input line-entry system (SMILES)
- Molecule editor
- SYBYL Line Notation
- Bioclipse generates InChI and InChIKeys for drawn structures or opened files
- the Chemistry Development Kit uses JNI-InChI to generate InChIs, can convert InChIs into structures, and generate tautomers based on the InChI algorithms
Notes and references
[edit]- ^ "IUPAC International Chemical Identifier Project Page". IUPAC. Archived from the original on 27 May 2012. Retrieved 2012-12-05.
- ^ Heller, S.; McNaught, A.; Stein, S.; Tchekhovskoi, D.; Pletnev, I. (2013). "InChI - the worldwide chemical structure identifier standard". Journal of Cheminformatics. 5 (1): 7. doi:10.1186/1758-2946-5-7. PMC 3599061. PMID 23343401.
- ^ "What on Earth is InChI?". IUPAC 100. Retrieved 10 May 2024.
- ^ "The InChI Trust and IUPAC". InChI Trust. Retrieved August 22, 2022.
- ^ a b c d e Heller, S.R.; McNaught, A.; Pletnev, I.; Stein, S.; Tchekhovskoi, D. (2015). "InChI, the IUPAC International Chemical Identifier". Journal of Cheminformatics. 7 23. doi:10.1186/s13321-015-0068-4. PMC 4486400. PMID 26136848.
- ^ a b "The IUPAC International Chemical Identifier (InChI)". IUPAC. 5 September 2007. Archived from the original on October 30, 2007. Retrieved 2007-09-18.
- ^ E.L. Willighagen (17 September 2011). "InChIKey collision: the DIY copy/pastables". Retrieved 2012-11-06.
- ^ "Inchi packages - Repology".
- ^ "Inchi - Debian Package Tracker".
- ^ a b c Heller, Stephen R.; McNaught, Alan; Pletnev, Igor; Stein, Stephen; Tchekhovskoi, Dmitrii (2015). "InChI, the IUPAC International Chemical Identifier". Journal of Cheminformatics. 7 23. doi:10.1186/s13321-015-0068-4. PMC 4486400. PMID 26136848.
- ^ Pletnev, I.; Erin, A.; McNaught, A.; Blinov, K.; Tchekhovskoi, D.; Heller, S. (2012). "InChIKey collision resistance: An experimental testing". Journal of Cheminformatics. 4 (1): 39. doi:10.1186/1758-2946-4-39. PMC 3558395. PMID 23256896.
- ^ "Technical FAQ - InChI Trust". inchi-trust.org. Retrieved 2021-01-08.
- ^ "InChI=1/C17H19NO3/c1-18..." Chemspider. Retrieved 2007-09-18.
- ^ "InChI/INCHI-1-SRC/INCHI_BASE/src/ikey_base26.h".
- ^ a b "RInChI/src/lib/rinchi_hashing.cpp".
- ^ InChI Resolver, 27 July 2015
- ^ "InChI Technical FAQ - InChI Trust". www.inchi-trust.org.
- ^ "IUPAC International Chemical Identifier (InChI) Programs InChI version 1, software version 1.04 User's Guide" (PDF). September 2011.
- ^ a b c d . PMC 5940998. PMID 29740723 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5940998.
{{cite journal}}: Cite journal requires|journal=(help); Missing or empty|title=(help) - ^ Clark, Alex M.; McEwen, Leah R.; Gedeck, Peter; Bunin, Barry A. (December 2019). "Capturing mixture composition: an open machine-readable format for representing mixed substances". Journal of Cheminformatics. 11 (1). doi:10.1186/s13321-019-0357-4. PMC 6533230.
- ^ a b c "MInChI Demo". molmatinf.com. (Example #3 taken for illustration. Use "copy branch" to copy as Mixfile JSON.)
- ^ Hunter N. B. Moseley; Philippe Rocca-Serra; Reza M. Salek; Masanori Arita; Emma L. Schymanski (14 May 2024). "InChI isotopologue and isotopomer specifications". Journal of Cheminformatics. 16 (1). doi:10.1186/S13321-024-00847-8. ISSN 1758-2946. PMID 38741211. Wikidata Q125934731.
- ^ a b Grethe, Guenter; Blanke, Gerd; Kraut, Hans; Goodman, Jonathan M. (9 May 2018). "International chemical identifier for reactions (RInChI)". Journal of Cheminformatics. 10 (1): 45. doi:10.1186/s13321-018-0277-8. PMC 4015173. PMID 24152584.
- ^ Iseult Lynch; Antreas Afantitis; Thomas E Exner; et al. (11 December 2020). "Can an InChI for Nano Address the Need for a Simplified Representation of Complex Nanomaterials across Experimental and Nanoinformatics Studies?". Nanomaterials. 10 (12). doi:10.3390/NANO10122493. ISSN 2079-4991. PMC 7764592. PMID 33322568. Wikidata Q104477914.
- ^ Goodman, Jonathan M.; Pletnev, Igor; Thiessen, Paul; Bolton, Evan; Heller, Stephen R. (December 2021). "InChI version 1.06: now more than 99.99% reliable". Journal of Cheminformatics. 13 (1): 40. doi:10.1186/s13321-021-00517-z. PMC 8147039. PMID 34030732.
- ^ Downloads of InChI Software, accessed Jan. 8, 2021.
- ^ "IUPAC/InChI-Trust Licence for the International Chemical Identifier (InChI) Software" (PDF). IUPAC/InChI-Trust. 2020. Retrieved 2022-08-09.
- ^ Warr, W.A. (2015). "Many InChIs and quite some feat". Journal of Computer-Aided Molecular Design. 29 (8): 681–694. Bibcode:2015JCAMD..29..681W. doi:10.1007/s10822-015-9854-3. PMID 26081259. S2CID 31786997.
- ^ Akhondi, S. A.; Kors, J. A.; Muresan, S. (2012). "Consistency of systematic chemical identifiers within and between small-molecule databases". Journal of Cheminformatics. 4 (1): 35. doi:10.1186/1758-2946-4-35. PMC 3539895. PMID 23237381.
External links
[edit]
InChI (P234) (see uses)
Official resources
- InChI Web Demo, IUPAC's official demonstration combining a molecule drawer and an InChI+InChIKey+AuxInfo generator (a WebAssembly translation of the official InChi code for running in the browser]
- IUPAC InChI site
- Description of the canonicalization algorithm
- Googling for InChIs a presentation to the W3C, October 2004
- InChI Release 1.02 InChI version 1.02 and explanation of Standard InChI, January 2009
Third-party tools
- NCI/CADD Chemical Identifier Resolver Generates and resolves InChI/InChIKeys and many other chemical identifiers
- PubChem online molecule editor that supports SMILES/SMARTS and InChI
- ChemSpider Compound APIs ChemSpider REST API that allows generation of InChI and conversion of InChI to structure (also SMILES and generation of other properties)
- MarvinSketch from ChemAxon, implementation to draw structures (or open other file formats) and output to InChI file format
- BKchem implements its own InChI parser and uses the IUPAC implementation to generate InChI strings
- CompoundSearch implements an InChI and InChI Key search of spectral libraries
- SpectraBase implements an InChI and InChI Key search of spectral libraries
- JSME Archived 2015-01-06 at the Wayback Machine is a free JavaScript based molecular editor that generates InChI and InChI Key in a web browser, which allows for easy web searches of chemical compounds


