Chemical Compound Parser

C/C++

What is it?

A simple library in C for a chemical formula parser that converts strings like C2H5OH into structured element counts, including total element types.

Features

  • Parses chemical formulas (e.g., CoCl3 + 6(NH3)) into element counts
  • Handles parentheses for grouped compounds (e.g., 6(CO2) → C:3, O:6)
  • Returns a cElement struct with total element count "TTT" and individual element quantities

Installation

Clone the repository and compile the code:

git clone https://github.com/yoonzh/chemcount

Usage

Import the chemcount library in your code:

#include "chemcount.h"

Then build into object file and link into your function.

cd chemcount  
gcc -c chemcount.c
gcc yourprogram.c chemcount.o -o yourprogram

Output Format

The cElement struct is an array of arrays sorted by order in the input string.

cElement[0].name = "TTT";
cElement[0].amount = N;

The first element in the struct has the title "TTT" and N is the total unique elements in the formula.

for (int i = 0; i < cElement[0].amount+1; i++) {
    printf("%s\t%f\n", cElement[i].name, cElement[i].amount);
}

The other objects in the cElement struct contain the string of the element and its amount in the compound.

chemcount has a built-in function printElements(cElement) to easily visualize the total elements in the cElement struct.

Example

chemcount("C2H5OH");
/* Returns:
 * ["TTT", 3]
 * ["C", 2]
 * ["H", 6]
 * ["O", 1]
 */

chemcount("CoCl3 + 6(NH3)");
/* Returns:
 * ["TTT", 4]
 * ["Co", 1]
 * ["Cl", 3]
 * ["N", 6]
 * ["H", 18]
 */

main.c example

In the repository you may find a main function containing an example of the usage. To use this example, run the following commands in your terminal:

git clone https://github.com/yoonzh/chemcount
cd chemcount
gcc -c chemcount.c
gcc main.c chemcount.o -o main
./main

Assumptions

  • Case-sensitive: Co (cobalt) ≠ CO (carbon monoxide).
  • Parentheses: Coefficients outside parentheses apply to all nested elements (e.g., (H2O)2 → H:4, O:2, 2(O2 + N2) → O:4, N:4).
  • Implicit “1”: Missing counts are treated as 1 (e.g., OH → O:1, H:1).