Short description:
Let us suppose that we want to analyse a copolymer, consisting on a combination of several units of monomer A and several units of monomer B. The copolymer (AmBn) will show polydispersity for both monomers. Suppose that we want to obtain the “complete picture”, i.e., the relative ammount of copolymer for each combination of monomeric units. In other words, our goal is to obtain something like the following picture (in which A is polyisoprene and B is polystyrene):
The task gets complicated if we want to obtain all this information out of a MALDI-MS experiment. In MALDI-MS, a single spectrum is going to contain all combination of monomers, each one with its own isotopic pattern (i.e., a tremendous mess of peaks). It is also impossible to solve the problem via simple regression, since the matrix to invert is so tremendously huge, rendering the computation inviable. We solved the problem using a little trick that we called “strip-based regression”. The idea behind is motivated by the fact that the design matrix of the regression is surprisingly sparse: for a certain region in the spectra, only limited numer of copolymer combination should be taken into account to explain the observed data. In principle, we could solve the regression problem at a local level (which is a lot easier computationally speaking).
However, the problem gets complicated when trying to define regions for this local regression. As there is always neighbouring regions influencing the region of interest, we have to take into account these surrounding areas in the regression to avoid bias. However, to explain these surrounding areas correctly we have to take into account areas that are neighbouring the surrounding areas. The problem continues on an on and in the end we should take the whole spectrum even if we are interested in the regression of a small part of the data.
It is difficult to explain in a few lines how we solved the problem (but I will try). We considered an extended region of the data for local regression and dividing it in two parts. The inner part (i.e. inner window) is the part that we are interested to regress. The outer part (i.e. outer window) is the part that we need to consider to correctly regress the inner part. However, we simply ignore the results of the regression obtained with the outer part, as they are biased (since we didn’t consider the neightoburing areas to this outer window). For full details about how this works, see [28a].
Credits:
This project was developed at different institutions. Several people were involved. See authors of the publications for more details about authorship.
Sponsors:
BASF, University of Amsterdam.
Presentations:
See my presentation at Baltimore (IFPAC-2013).
Software:
The software for this methodology is still under construction. We hope to offer software for this soon.
Tags:
- Application domain: Chemicals.
- Instrument domain: MS
- Statistics domain: Deconvolution, modelling & regression