funded by  
For quite all types of aberrations, a few features are common:
After each matching step, the matched part of the aberration description is removed. If the matchings do not consume up all of the original description, the aberration is assumed to contain an error.
If the aberration describes a derivative chromosome, the analysis is more complicated. After the original match of the der()-part, a collection of the aberrations following is established and these aberrations are analysed as described above. Then, the aberrations are introduced into the derivative chromosome.
A detailed description of the steps follows:
Since not for every aberration a gain or loss can be accepted (e.g. +t(9;22)(q34;q11) is not acceptable because the aberration gives raise to two aberrant chromosomes), bogus indications have to be found out when the aberration core was identified.
The SCCN quantitative aberration depends on whether the aberrant chromosome replaces a normal chromosome or is additionally present.
Some of the patterns used for identifying special parts of the aberrations are as follows.
The chromosome number consists of either digits or one of XY; a leading question mark may indicate that the chromosome was not fully reliably identified during karyotype analysis. Hence the elementary pattern is "(?<numN>\??[\dXY]*)". It also allows for arbitrary combinations of XY and digits (e.g. "?Y2X") which must be identified later when break points are checked for existence.
Often bands are delimited by semicola. Here, a pattern catching everything upto the next semicolon is useful: "(?<bandN>[^\;]+)". Accordingly, the pattern for the last band before the closing bracket is "(?<bandN>[^\)]+)". These patterns will accept quite any bogus descriptions which will cause an error when a Band object is instantiated with.
A much more complicated pattern uses the structure of a band description. It starts with the notation of the arm (p or q) followed by a row of digits. If a range is given, an approximate sign ("~" or "-") followed by another row of digits may follow. Question marks may be sprinkled quite everywhere. This pattern is "(?<bandN>\??[pq\?][\?\d]*([\-\~][\?\d]+)?)" and is useful when bands are not separated by semicola (e.g. with intrachromosomal insertions).
The pattern used is "^(?<num>[\dXY]+)(?<q>\??)$".
In contrast to the ISCN manual, multiplicators are accepted also for this type of aberrations (multiplicators were analysed above).
According to the ISCN manual, also interstitial fragments may be exchanged. When scanning several thousands of karyotypes for such a translocation, only one was found: t(2;4)(p15q23;p14q24) stemming from a publication from 1993, i.e. before ISCN 1995; and here, a centromere containing fragment would have to be exchanged. No karyotype showing a translocation of interstitial fragments between three chromosomes was discovered. The example given in the ISCN manual ("t(5;6)(q13q23;q15q23)") could not be found in the Mitelman database. Hence, doubts if such translocations do really exist may be advisable. Nonetheless, the enormous effort of dealing with exchanges of both terminal and interstitial fragments is done (see also "Discussion of ISCN").
Before ISCN 1995, the symbol "t" was also used for insertions. E.g. "t(11;14)(q13;q22q24)" would now be written "ins(11;14)(q13;q22q24)". This outdated style is not accepted by the analyse function.
A translocation cannot be gained ("+t..." is not acceptable). Translocations are balanced and thus do not yield gain or loss of chromosomal material.
The order of chromosomes in translocations is not cared about, e.g. "t(21;X)" instead of "t(X;21)" is also accepted.
Bands are then caught with the pattern "^\((?<band1>[^\;]+)\;(?<band2>[^\)]+)\)$" which actually puts everything after the opening bracket upto the semicolon into group "band1" and everything after the semicolon upto the closing bracket into group "band2". If one of the groups contains more than one of "p" or "q", a translocation of an interstitial fragment is assumed. Here, each of the groups is matched to the pattern "^(?<band1>\??[pq\?][\?\d]*([\-\~][\?\d]+)?)(?<band2>\??[pq\?][\?\d]*([\-\~][\?\d]+)?)?$" to receive the bands involved. Else, the groups already correspond to the bands.
Special care is taken during later analysis to ensure that the number of bands found is adequate and that the bands do really describe bands. Furthermore, in a translocation of terminal fragments, either both break points must be centromers or none of them.
The basic pattern for identifying such aberrations is "^t\((?<num1>\??[\dXY]*)\;(?<num2>\??[\dXY]*)\;(?<num3>\??[\dXY]*)\)", where "num1" will show up the first chromosome, and "num2" the second chromosome and "num3" the third chromosome.
Bands are caught with the pattern "^\((?<band1>[^\;]+)\;(?<band2>[^\)]+)\;(?<band3>[^\)]+)\)$" and further analysed as described above.
The basic pattern for such translocations is "^t\((?<num1>\??[\dXY]*)\;(?<num2>\??[\dXY]*)\;(?<num3>\??[\dXY]*)\;(?<num4>\??[\dXY]*)\)" for a translocation involving four chromosomes. The block "(?<numN>\??[\dXY]*)\;" is repeated accordingly for higher numbers with adjusting the group numbers. The pattern for the bands is adjusted accordingly, but no care is taken of differentiating between terminal and interstitial fragments (in the latter case, an error would be raised).
This pattern is repeated for translocations upto ten chromosomes.
The pattern for identifying the aberration is "^del\((?<num>[\?\dXY]+)\)" with the "num" group catching a chromosome number consisting of digits, XY, or question marks.
Afterwards, bands are searched for with the pattern "^\((?<band1>\??[pq\?][\?\d]*([\-\~][\?\d]+)?)(?<band2>\??[pq\?][\?\d]*([\-\~][\?\d]+)?)?\)$". If the group "band2" stays empty, the deletion is assumed to be terminal, otherwise interstitial. Further calculation (SCCN etc.) must differentiate between these situations.
Furthermore, deletions of centromere spanning regions would give raise to acentromeric fragments which would be lost during cell divisions. Hence, in such cases the aberration must be marked invalid.
The pattern for identifying the aberration proper is "^add\((?<num>[\?\dXY]+)\)". The afflicted band is searched for with the pattern "\((?<band1>\??[pq\?][\?\d]*([\-\~][\?\d]+)?)\)$".
The pattern for identifying the aberration proper is "^dup\((?<num>[\?\dXY]+)\)". The afflicted bands are searched for with the pattern "^\((?<band1>\??[pq\?][\?\d]*([\-\~][\?\d]+)?)(?<band2>\??[pq\?][\?\d]*([\-\~][\?\d]+)?)?\)$".
For the calculation of fusions, the order of the bands relative to the centromere is used to find out if the duplication was directed or inverted.
The pattern for identifying the aberration proper is "^dic\((?<num1>[\?\dXY]+)(\;(?<num2>[\?\dXY]+))?\)". The afflicted bands are searched for with the pattern "^\((?<band1>[^\;]+)(\;(?<band2>[^\)]+))?\)$".
When the dicentric chromosome replaces its normal chromosomes, two fragments are lost: those extending from the break points to the terminus of the same chromosomal arms. If the dicentric chromosome has been gained, two fragments have been gained: those extending from the break points via the centromere to the terminus of the opposite arm.
The pattern is "^i\((?<num>[\?\dXY]+)\)\((?<band>\??[pq]10)\)". I.e., isochromosomes are analysed in one step.
When the isochromosome replaces its normal chromosome, one fragment is lost (the other arm), and one fragment is gained (the arm the chromosome is made of). When the isochromosome is additionally present, no fragment is lost, but the arm the chromosome consists of is gained twice.
The pattern for identifying the aberration proper is "^idic\((?<num>[\?\dXY]+)\)". The afflicted bands are searched for with the pattern "^\((?<band1>[^\)]+)\)$". The centromere is by definition not a valid break point.
The isodicentric chromosome starts at the remaining terminus, extends via the centromere to the band denoted, and that part is duplicated and thus gained twice if the iso-dicentric chromosome is additionally present. The region from the denoted band upto its terminus has been lost once when the iso-dicentric chromosome replaces its normal chromosome.
The pattern for identifying the aberration proper is "^ins\((?<num1>[\?\dXY]+)(\;(?<num2>[\?\dXY]+))?\)". If only group "num1" is filled, it is an intrachromosomal insertion, when also group "num2" is filled, the insertion is interchromosomal.
The afflicted bands are searched for with the pattern "^\((?<band1>\??[pq\?][\?\d]*([\-\~][\?\d]+)?);?(?<band2>\??[pq\?][\?\d]*([\-\~][\?\d]+)?)(?<band3>\??[pq\?][\?\d]*([\-\~][\?\d]+)?)?\)$".
Note that this pattern requires the presence of two bands only; if
missing, the third band is assumed to be unknown and thus "?" (e.g. in
"ins(5;?)(q13;?)" the second band is unknown and marked as such,
so the third band being also unknown need not be denoted).
Also, the pattern regards the semicolon between the first and the second
band as an optional character, regardless of the insertion type; i.e. in
contrast to the ISCN manual the description will be accepted also when
the semicolon is missing in interchromosomal aberrations or present in
intrachromosomal insertions (e.g. "ins(5;2)(p14q22q32)" or "ins(2)(p13;q21q31)").
For fusions, the order of the second and third band relative to the centromere must be looked at.
The pattern for identifying the aberration proper is "^inv\((?<num>[\?\dXY]+)\)". The afflicted bands are searched for with the pattern "^\((?<band1>\??[pq\?][\?\d]*([\-\~][\?\d]+)?)(?<band2>\??[pq\?][\?\d]*([\-\~][\?\d]+)?)?\)$".
Inversions are balanced aberrations and thus do not yield gains or losses by themselfes. A gain of an inversion corresponds to the gain of the whole chromosome.
But not every chromosome involved needs to confer a centromere to the ring chromosome. If not marked "dic r" or "trc r", the ring chromosome is actually believed to be monocentric. But as can be proved with data from the Mitelman database, some of them are dicentric without being marked.
Monocentric ring chromosomes containing material of several chromosomes must be described as derivative chromosomes "der(a)r(a,b;c)...", where the centromere bearing chromosome need not be named first in the "r" clause.
The pattern for identifying the aberration proper is "^(dic)?r\((?<num1>[\?\dXY]+)(\;(?<num2>[\?\dXY]+))?(\;(?<num3>[\?\dXY]+))?\)". It requires at least one chromosome and allows upto three chromosomes. A "trc r" would not be recognized, but that does not exist in the Mitelman database and thus is irrelevant.
Catching the band groups depends on the number of chromosomes found.
If there are three chromosomes, the primary pattern is "^\((?<band1>[^\;]+)\;(?<band2>[^\)]+)\;(?<band3>[^\)]+)\)$", in a further step each group is matched to "^(?<band1>\??[pq\?][\?\d]*([\-\~][\?\d]+)?)(?<band2>\??[pq\?][\?\d]*([\-\~][\?\d]+)?)?$". If a group contains a question mark only, both bands for that chromosome are thought to be unknown. Else if only one band for a chromosome was found, an error is raised. Since there are no tricentric ring chromosomes, such rings must be described as derivative chromosomes; SCCN data etc. must then be requested from that derivative chromosome.
With two chromosomes, the pattern is "^\((?<band1>[^\;]+)\;(?<band2>[^\)]+)\)$" which is further analysed as with rings of three hcromosomes. Non-dicentric rings of such type must be described as derivative chromosomes, and SCCN data etc. has to be requested from that derivative chromosome. Dicentric ring chromosomes require the calculation of break points, fusions, SCCN qualitative and quantitative. If a dicentric ring replaces the normal chromosomes, all those fragments not denoted in the description are lost; if the ring is additionally present, both fragments denoted are gained.
If there is only one chromosome the pattern is "^\((?<band1>\??[pq\?][\?\d]*([\-\~][\?\d]+)?)(?<band2>\??[pq\?][\?\d]*([\-\~][\?\d]+)?)?\)$". Break points, fusions, SCCN qualitative and quantitative are calculated here.
The pattern for identifying the aberration proper is "^trp\((?<num>[\?\dXY]+)\)". The bands are then caught with the pattern "^\((?<band1>\??[pq\?][\?\d]*([\-\~][\?\d]+)?)(?<band2>\??[pq\?][\?\d]*([\-\~][\?\d]+)?)?\)$".
For gains and losses, the triplicated fragment has been gained twice; if the triplication is additionally present, the whole (original) chromosome has additionally been gained once. As for fusions, the orientation of the triplicated fragment cannot be determined, hence both possible interpretations are taken, but marked questionable.
The pattern for identifying the aberration proper is "^tas\((?<num1>\??[\dXY]*)\;(?<num2>\??[\dXY]*)(\;(?<num3>\??[\dXY]*))?(\;(?<num4>\??[\dXY]*))?\)" which allows for telomeric associations of upto 4 chromosomes requiring at least two chromosomes. Band groups are then caught with the pattern "^\((?<band1>[^\;]+)\;(?<band2>[^\;]+)(\;(?<band3>[^\;]+))?(\;(?<band4>[^\;]+))?\)$" which also allows for upto 4 groups while requiring at least two groups.
If there are more than two chromosomes, the second band group must show two bands; if there are four chromosomes, also the third group must show two bands; other groups must show one band only.
SCCN qualitative aberrations are strightly calculated. The bands afflicted are not counted as break points, no gains are losses occured.
If it is located at the interface between two different chromosomes in a derivative chromosomes, both chromosomes and bands are noted.
The pattern for identifying the aberration proper is "^hsr\((?<num1>[\?\dXY]+)(\;(?<num2>[\?\dXY]+))?\)" which requires at least one chromosome. The band(s) are then caught with the pattern "^\((?<band1>\??[pq\?][\?\d]*([\-\~][\?\d]+)?);?(?<band2>\??[pq\?][\?\d]*([\-\~][\?\d]+)?)?\)$". If two chromosomes were detected, two bands must be present, otherwise exactly one band.
The pattern for identifying the aberration proper is "^trc\((?<num1>\??[\dXY]*)\;(?<num2>\??[\dXY]*)\;(?<num3>\??[\dXY]*)\)". Band groups are caught with the pattern "^\((?<band1>[^\;]+)\;(?<band2>[^\;]+)\;(?<band3>[^\;]+)\)$", the scond group being further analysed with "^(?<band1>\??[pq\?][\?\d]*([\-\~][\?\d]+)?)(?<band2>\??[pq\?][\?\d]*([\-\~][\?\d]+)?)?$".
Break points, SCCN qualitative aberrations and fusions are calculated straight forward. For gains and losses, all fragments the tricentric chromosomes consists of are gained when the tricentric is additionally present. If it replaces its three normal chromosomes, the terminal fragment of the first chromosome, both terminal fragments of the second chromosome, and the terminal fragment of the third chromosome are losses.
The pattern for identifying the aberration proper is "^qdp\((?<num>[\?\dXY]+)\)". The bands are then caught with the pattern "^\((?<band1>\??[pq\?][\?\d]*([\-\~][\?\d]+)?)(?<band2>\??[pq\?][\?\d]*([\-\~][\?\d]+)?)?\)$".
For gains and losses, the quadruplicated fragment has been gained thrice; if the quadruplication is additionally present, the whole (original) chromosome has additionally been gained once. As for fusions, the orientation of the quadruplicated fragment cannot be determined, hence both possible interpretations are taken, but marked questionable; correct multiplicators cannot be given.
The pattern for identifying the aberration proper is "^ider\((?<num>[\?\dXY]+)\)".
The band is then caught with the pattern
"\((?<band>\??([pq]10)?)\)".
Since the isomerization is the last step in the formation of an isoderivative chromosome, the aberration "ider(N)([pq]10)<aberrations>" is translated into the derivative chromosome "der(N)<aberrations>i(N)([pq]10)" and then analysed as a derivative chromosome (see next chapter).
For all derivative chromosomes, after the symbol "der" the centromere conferring chromosome(s) follow in brackets, separated by semicola. Hence, the pattern for identifying the aberration proper is "^der\((?<num1>\??[\dXY]*)(\;(?<num2>\??[\dXY]*))?(\;(?<num3>\??[\dXY]*))?\)" which allows upto three chromosomes.
The following steps depend on the basic type of the derivative chromosome.
While in the ISCN manual, this type of notation is shown for Robertsonian translocations and other whole arm translocations only (pp.70f), cases describing dicentric chromosomes such way were discovered in the Mitelman database, e.g. Callet-Bauchu et al 1999, Leukemia, Case No. 1, or Giagounidis et al 2002, Ann Hematol, Cae No. 1.
Here, a chromosome object is instantiated with the first chromosome of the der clause, and then the formation of a dicentric chromosome is calculated.
If further aberrations are present in such a derivative chromosome, they are dealt with as in "normal" derivative chromosomes.
In the tail of remaining aberrations, the aberrations are searched for with the pattern "^\??\w+\([\?\dXY]+(\;[\?\dXY]+)*\)(\([pqor\?\d\;\-\~]+\))?". The pattern assumes that the aberration starts with its textual symbol (maybe with a question mark before it), and is followed by its chromosome(s) in brackets which may be separated by semicola. The bands of the aberration may be present or not. The pattern will take one aberration per loop.
If the pattern matched, an Aberration object is instantiated with the matched region. This aberration is analysed normally. If it is valid, it is stored in the collection of aberrations mcAberrations, otherwise the error description is taken from the aberration and analysis is aborted.
After looping through the tail of aberrations, the aberrations are introduced into the derivative chromosome instantiated above.
While most aberrations can be introduced straight forward, translocations need more consideration. Sometimes, the formation of a dicentric chromosome is meant. Hence, if the derivative chromosome is described with two centromere confering chromosomes, the derivative chromosome object is checked if it contains the acceptor band of the translocation and its second chromosome number equals the second chromosome number of the translocation, or alternatively if it contains the donor band of the translocation and its first chromosome number equals the first chromosome number of the translocation.
If the translocation was found to mean the formation of a dicentric chromosome, a dicentric is introduced instead.
In normal translocations, it may happen that the first chromosome of the der clause is not the first chromosome of the translocation term. If no aberration was introduced into the derivative chromosome object yet, a chromosome object is instantiated with the second chromosome of the der clause.
For all translocations, the fact is considered that the donor fragment could be named first in the t clause and thus chromosome and band ordering need to be rearranged for the introduction of the translocation.
Also with derivative chromosomes stemming from translocations involving three chromosomes, care must be taken to select the correct donor fragment.
The procedures for the introduction of aberrations into a derivative chromosome are described more closely in "Calculating Derivative Chromosomes".
Break points, gains and losses are then queried from the derivative chromosome object.