[Previous] [Next] [Contents]

The design of the MuPAD-Combinat package

The choice of the platform

The choice of the platform has been a very difficult and long discussed question. We try to present here some arguments to motivate our choice. These arguments are more or less objective, and depending on their research field or computing habits, different people would certainly choose a different order for the relative importance of these arguments. So we do not claim to have any kinds of scientific rigor here. We had to make a choice at some point !

MuPAD is to some extends a object oriented language. We have already shown why this is an important advantage through the demonstration, and this will be extensively discussed later, so we don't discuss it here more.

The main goal is to do research and to share the code. So we have to choose a widely used and well known computed algebra system. We do not have made a serious statistic but is seams that the two main systems used in the algebraic combinatoric community (or at least the community of people frequenting the lotharingian seminar) are the commercial system Maple and . Our experience with Maple in Marne-La-Vallée is bad and seems not to fits our technical requirement concerning object oriented feature and overloading.

We have also though about two other universitary platform. and . The fact that the team keep their algorithms closed is a major obstacle in our view. One the contrary is totally open source. The main disadvantage with is that they do not provide the full basic mathematical toolbox we need in our exploration (polynomial and expressions computations, ...) However, we strongly thought about closely interfacing MuPAD with and some positive small proof of concept have already been done.

MuPAD is not open-source nor free. This is in our minds a strong drawback. We write code for research and we think that the code is a way to publish the work. One of main goal of MuPAD-combinat is to allows researchers to easily share their code. So we cannot imagine to write a closed code that belongs to some people and that cannot be modified or reused is any ways. That's why we choose the Gnu Lesser Public License for distributing our code. So the fact that MuPAD is not open source is a strong argument against its use. However, they promise that MuPAD will be always free for personal use in public research institutions. Moreover they promise to publish the code source of the library under a well known open-source license, as soon as they have found a convenient one together with some legality confirmation by a lawyer. For the moment, we are still awaiting...

A posteriori, one of the most important argument is the fact that we have a very good cooperation together with me MuPAD development team. First of all, at least concerning the question of what has to be done, we have the full responsibility of the combinatorial library of MuPAD. This has some strong disadvantages such as deadline, bug report... But this also means that the MuPAD team is distributing our code. Thus they always keep one eyes on what we are doing. Note that the test-suite of the whole MuPAD project is launched every night and if something is broken in our part, we knows it immediately. We do not pretend that the MuPAD team helps us in maintaining our code but we have a very strong support. To try to have an objective number, in this year (2003) we have something like 500 messages on the mailing list mupad-combinat-devel. Nearly 200 are from the MuPAD team.

Naming conventions

Long names versus abbreviations

The convention for library, domain and function names is to use long names that are as meaningful and close to the English spelling as possible: e.g. partitions instead of PART, FreeQuasiSymmetricFunctions instead of FQSym, etc. In particular, abbreviations should be avoided, except in extreme cases where the short name is really well established (say Lex instead of Lexicographic). Here are the motivations for this convention:

Case of names

We follow the capitalization rules of the MuPAD coding standard, which are quite similar to Java or C++ rules:

Composite names

When the name of a library, domain or function is composed of several parts, and those parts are also used in other names, it may be worthwhile to order those parts from the most general to the most specific. For example, we used the name combinat::integerVectorsWeighted instead of the more natural name combinat::weightedIntegerVectors. The advantage is that all the domains dealing with integer vectors start with the same prefix, which is particularly practical with respect to automatic completion. This is also coherent with the hierarchy library::sublibrary::subsublibrary. Another typical case is when several functions return a similar result but under different forms; the function that is the most useful or natural gets the short name, and the names of the other functions, are suffixed with the "type" of the result (words::inversions/words::inversionsList, ...). We also used this rule of thumb for the names of the free module methods (mult/multBasis/mult2/mult2Basis, straighten/straightenBasis, print/printTerm/printMonomial/printBasis).

Representing combinatorial objects and classes

The notion of combinatorial object is best described by some examples: a partition, a binary tree, a permutation, a graph, a Feynman diagram, a Dyck word, and other similar discrete objects are all combinatorial objects.

A combinatorial class is a (countable) set of related combinatorial objects (e.g. the set of all partitions, the set of all binary trees, the set of all standard permutations), on which a size function is defined (e.g. the size of partition is the sum n of its parts; the size of an integer vector consist of a pair (n,k) of integers: its sum n and its length k; the size of a tree is the number of its nodes). Typically, the fibers of the size function define natural finite subsets of the class that one wants to count, enumerate, and so on (e.g. counting all the partitions of n=4, listing all the integer vectors of sum 5 and length 3); we say ``typically'', because there are some cases where it is practical to use this framework even if the subsets are only countable. In many cases optional restrictions can be added to define smaller subsets of the class to be counted/enumerated/..., (e.g. the partitions of 4 of length at most 4). In some combinatorial classes (e.g. the class of the permutations of 5), the size function may be degenerated and have only one non trivial fiber.

Representing combinatorial objects

A combinatorial object may belong to several combinatorial classes simultaneously. For example, the list [4,3,2,1] may be interpreted as a partition, a permutation, an integer vector, a composition. This is reflected in MuPAD by our convention that a combinatorial object is not necessarily strongly typed by the combinatorial class(es) to which it belongs. An object has a unique domain: it corresponds to the data structure of the object and can be obtained by the command domtype. On the other hand, it may be of different types: they correspond to the different semantics that can be attached to the object, and they can be tested with testtype.

For example, [3,4,2,1] belongs to the MuPAD domain of lists, DOM_LIST; it is simultaneously a list of positive integers (Type::ListOf(Type::PosInt)), a word, a permutation, etc, while it is not a partition:

>> domtype ([3, 4, 2, 1]);
                                 DOM_LIST
>> testtype([3, 4, 2, 1], Type::ListOf(Type::PosInt)),
   testtype([3, 4, 2, 1], combinat::words),
   testtype([3, 4, 2, 1], combinat::compositions),
   testtype([3, 4, 2, 1], combinat::integerVectors),
   testtype([3, 4, 2, 1], combinat::permutations),
   testtype([3, 4, 2, 1], combinat::partitions)
                    TRUE, TRUE, TRUE, TRUE, TRUE, FALSE
In the MuPAD terminology, the domains like combinat::compositions are called facade domains; they do not really contain elements of their own.

On the other hand, some combinatorial objects, such as trees, require a specific data structure; these objects are strongly typed, that is their domain is the class itself. This has, among others, the advantage that they are pretty printed by the system:

>> t := combinat::binaryTrees([1, [1], [1]]);
   domtype(t);
   testtype(t, combinat::binaryTrees);
                                     o
                                    / \
      
                           combinat::binaryTrees
      
                                   TRUE

This choice of not systematically using strong typing for combinatorial classes is not an obvious one, and there is no clear cut criteria for deciding whether a given combinatorial class should use strong typing or not. On the one hand, strong typing allows for object-oriented techniques and overloading. On the other hand, having to cope with all the conversions (a partition is also a composition, ...) can be quite a burden for both the user and the programmer; indeed, choosing which implicit conversions to provide is not an easy task, given the overwhelming number of natural bijections. Finally, with the current MuPAD language, there is a small memory and time overhead with strong typing; this can be considered as a misfeature of MuPAD though.

Aside from the data structure criteria, another reasonable criteria is whether the combinatorial class has natural ``algebraic operations''. This is why we currently have both a facade domain combinat::permutations for general permutations seen as words, and a real domain Dom::SymmetricGroup for standard permutations seen as elements of the symmetric group. Actually, it could be reasonable to have a real domain Dom::Permutation, and have Dom::SymmetricGroup and Dom::PermutationGroup be facade domains for Dom::Permutation. This is still under discussion, and comments are welcome.

Representing combinatorial classes

Combinatorial classes for which we want to do counting, generation, or other manipulations are represented by MuPAD domains, like combinat::partitions or combinat::binaryTrees. Note that, in many cases, those domains are just facade domains and do not really contain elements: as we said above, the domain of a partition, or of a permutation, is really DOM_LIST. Those domains also define a MuPAD type; by convention, it is named like combinat::partitions::type, and can be tested with:

>> testtype([3, 2, 2, 1], combinat::partitions)
                                   TRUE
Simpler combinatorial classes, which we only want to use for type checking, are just represented by MuPAD types. This is typically used for subsets of other combinatorial classes. For example, the standard permutations form a subset of all permutations, and are represented by the type combinat::permutations::typeStandard. We are not quite happy with this naming convention; however, for better localization, we really would like to keep the types defining a subset of a domain inside this domain. Another option was to use subdomains even in this case. But domains are quite special (and expensive?) objects in MuPAD: they have a reference effect, they cannot be deleted, etc. So, we feel that this would be overkill, especially for parametrized types like PermutationOf([a,b,c,d]).

Another related situation: quite often, we have a function that returns a collection C of related objects, usually as a list. Think of combinat::words::shuffle([1,2,3],[a,b,c]) which returns a list of words. Or think of the inverse of a function that is not at all injective like combinat::permutations::fromCycleType (it returns all the permutations having a given cycle type). In such cases, we often want to do some more involved things, like having a generator for the elements of C, or being able to count them without actually generating them. Then, it is natural to consider C as a combinatorial class, and to represent it by a MuPAD domain. This gives a unified interface to all the standard functions for counting, generating, ...:

As a nice side effect, the standard alias from new to list allows to use the natural syntax combinat::words::shuffle(word1,word2) to obtain the collection C. So, switching from a simple function which returns C to a domain for the elements of C is transparent for the user. Usually, such a domain will be a subdomain of an existing domain (here combinat::words::shuffle is a subdomain of combinat::words).

Combinatorial classes and categories

A domain which represents a combinatorial class belongs to the category Cat::CombinatorialClass. Such a domain should implement at least generator or list. This category also provides naming conventions for usual functions like count, first, next, random, unrank, ... The implementation of those functions is not explicitly required by the category Cat::CombinatorialClass: depending on the specific combinatorial class sometimes they are not yet implemented, sometimes there exists no efficient algorithm, or sometimes they simply do not really make sense. In the future, we may think about refining Cat::CombinatorialClass into subcategories that describe which of those functions are available (for example, the category of combinatorial class which provide an unrank function). So far, the benefits coming from such refinements do not seem to justify the overhead in the complexity of the category hierarchy.

Also, all of this is not really specific to combinatorial classes. We could imagine generalizing this to any kinds of collections of objects, and mimic the category hierarchy of, e.g. Aldor. By convention, all the subcategories of Cat::CombinatorialClass have a name of the form Cat::XxxClass. Right now, we have two sub categories of Cat::CombinatorialClass:

Those two categories are purely technical; they respectively provide wrappers around the generic domains combinat::decomposableObjects and combinat::integerListsLexTools, and allow to factor out some routine code. For example, combinat::partitions, combinat::integerVectors, and combinat::compositions are in the category Cat::integerListsLexClass, which takes care of the parsing of common options.

Further naming comments

The names of the domains combinat::integerListsLexTools and combinat::decomposableObjects are quite different. This reflects the fact that those two domains do not play the same role. combinat::decomposableObjects is a parametrized domain whose instantiations represent combinatorial classes, whereas combinat::integerListsLexTools essentially is a collection of tools with a scarce interface, geared toward speed and internal use.

The name Cat::integerListsLex is too general, since this category contains only the combinatorial classes described using length, bound, and slope constraints. For example, the elements of combinat::permutations are integer lists and are naturally ordered lexicographically; however this combinatorial class is not in Cat::integerListsLex. Badly enough, we haven't found a better name that would not be too long. Unless someone comes up with a clever suggestion we will stick to this name. We use Next for the name of the method that computes the next element in a combinatorial class. This is not coherent with first, last and with the general convention that a method name start by a lowercase letter. Badly enough, next is a reserved keyword, and we cannot use it as method name in MuPAD.

Representing combinatorial algebras

What is a combinatorial algebra after all?

Let us start by a precise definition for the term combinatorial algebra that we have used so far in a rather informal way.

Given a combinatorial class C, and a ring R, one can define the free module F with basis indexed by C over the ring R; an element of F is a formal finite linear combination of elements of C with coefficients in R. Alternatively, an element of F can be interpreted as a function from C to R with finite support; that is a function which is zero except on finitely many elements of the basis C.

For example, here is an element of the free module with basis indexed by partitions, over Q: x:=4 [3,2,1] + 3 [2,1,1] + 1/4 [1,1,1,1] .

Polynomials are another typical example of free modules, and we extend the usual definitions used for polynomials. The coefficient of [1,1,1,1] in x is 1/4; we call the partition [3,2,1], seen as an element of F, a term; 4 [3,2,1] is a monomial; finally, the support of x is the set of the partitions with non-zero coefficients, that is {[3,2,1], [2,1,1],[1,1,1,1]}.

By combinatorial algebra we mean such a free module, together with some extra algebraic operations (a product, coproduct, antipod) which makes it an algebra, a bialgebra, or a Hopf algebra. Those operations are typically defined by linearity on the basis.

With this definition, we have distinguished a special basis of the combinatorial algebra. Most of the time, a combinatorial algebra (like the algebra of symmetric functions) will actually have several interesting basis (Schur functions, power-sum functions, ...), all of them indexed by C. The underlying free module remains the same, but the operations will vary accordingly. Changing from one basis to the other is one of the fundamental operations.

Why use strong typing?

Traditionally in computer algebra systems (say with SF, ACE or mu-EC), symmetric functions have been represented by symbolic expressions:

>> p[2]*p[1];
   muEC::SYMF::Top(p[2]*p[1]);
   muEC::SYMF::Tos(p[2]*p[1])
                                 p[1] p[2]
      
                                  p[2, 1]
      
                             s[3] - s[1, 1, 1]
This is also the approach used for polynomials, and more generally for Ore-algebras in Maple. This has several advantages:

However, this approach has also serious drawbacks:

Altogether, this approach is fine when there are very few combinatorial algebras; however it does not scale to a dozen predefined algebras (that's our short-term goal) plus myriads of user-defined algebras. In practice, this is one of the main reasons why the ACE project stalled when defining many new algebras became a must.

It was time for a complete redesign and rewrite of the package. We will see that using strong typing allows for circumventing all those drawbacks, without loosing too much of the advantages. The choice of switching to MuPAD was largely influenced by the strong integration of their domains/axioms/category mechanism in their system, that allows for strong typing, and object-oriented techniques.

Representing free modules

The first thing to do is to choose the internal data structure to store an element of a free module. There are different possible implementations without a clear cut advantage of one over the others (as a parallel, in C++ there exists two implementations of association tables: map using sorted lists and hash_map using hash tables). We provide several of them:

Dom::FreeModuleTable(R, Basis)

An element x is stored as an association table (DOM_TABLE). For example, here is the internal representation of an element x:

>> F := Dom::FreeModuleTable(Dom::Rational, combinat::partitions):
   x := 4*F([3, 2, 1]) + 3*F([2, 1, 1]) + 1/4*F([1, 1, 1, 1]);
   extop(x)
           4 B([3, 2, 1]) + 3 B([2, 1, 1]) + 1/4 B([1, 1, 1, 1])
      
                           table(
                             [1, 1, 1, 1] = 1/4,
                             [2, 1, 1] = 3,
                             [3, 2, 1] = 4
                           )
Since any MuPAD object can be used as index of a table, there is no restriction on the basis elements. Accessing the coefficient of a term is constant time.

Dom::FreeModulePoly(R, Basis)

The kernel polynomial objects of MuPAD (domain DOM_POLY) are stored in a sparse non-recursive way using sorted lists of monomials with a fixed number of variables. If one forgets about the product, this provides a sparse data structure which is both compact in memory and very fast for linear operations. Typically, the MuPAD sparse matrices (domain Dom::SparseMatrix) use univariate polynomials internally as internal representation for sparse column vectors.

Similarly, an element x of Dom::FreeModulePoly(R, Basis) is stored using a polynomial:

>> (F := Dom::FreeModulePoly(Dom::Rational, combinat::partitions); 
   x := 4*F([3, 2, 1]) + 3*F([2, 1, 1]) + 1/4*F([1, 1, 1, 1]));
   extop(x)
           1/4 B([1, 1, 1, 1]) + 3 B([2, 1, 1]) + 4 B([3, 2, 1])
      
                               3       2
                    poly(1/4 _X  + 3 _X  + 4 _X, [_X])
To achieve this, one needs to be able to represent an element of the basis using an exponent vector (an integer, or a fixed-length list of integers). This is trivial when the basis readily consists of fixed length lists of integers (integer vectors, standard permutations of a given n, ...). Otherwise, the user may provide a pair of functions rank and unrank that does the conversions. By default, the system creates a dummy pair of such functions: the rank function associate in turn the numbers 1,2,3,... to each new object it encounters. For example, the rank of [3,2,1], [2,1,1], and [1,1,1,1] above are respectively 1,2, and 3, that corresponds to the order in which the corresponding elements of F have been created.

This representation is very fast for linear operations. Furthermore, if univariate polynomials are used with the variable _X (this is the default), the data structure coincides exactly with the one used by Dom::SparseMatrix. This allows for zero-cost conversions to and from sparse vectors, for doing linear algebra.

Accessing the coefficient of a term in an element x is logarithmic in the number of terms of x (in the multivariate case, with MuPAD < 3.0.0 this is linear instead of logarithmic).

Ranking and unranking is only done for conversions, and computing products. In practice, the overhead with the dummy implementation seems to be negligible, and largely compensated by the fact that most operations deal with integers.

Dom::FreeModuleList(R, Basis)

Thanks to Stefan Wehmeier, (Univ. Paderborn) and Werner M. Seiler (Univ. Karlsruhe), this was mostly already implemented in the MuPAD library under the name Dom::FreeModule since 1997. An element is represented by a sorted list of terms. For example, here is the representation of the element x above: [[4, [3, 2, 1]], [3, [2, 1, 1]], [1/4, [1, 1, 1, 1]]].

Obviously, there needs to be an order on the basis elements (Basis should be a Cat::OrderedSet). Accessing a leading or trailing term is constant-time; accessing the coefficient of a given term in an element x is logarithmic in the number of terms of x.

All those implementations are in the category Cat::ModuleWithBasis which defines a unified interface. So, one can change the underlying implementation at any time. Unless you have a specific reason to choose one of the implementations, just use the default implementation Dom::FreeModule(R, Basis).

Representing combinatorial algebras on a given basis

Once the underlying linear structure is implemented, it is very easy to construct functions on a free module by linearity / bilinearity / multilinearity on the basis (see the examples) using the utilities from operators. So, implementing operations like products, coproducts, antipods usually boils down to implement the underlying combinatorics. Note that we do not have yet a general construction for the tensor product of two free modules F1 and F2, but you can emulate one by hand by building a free module whose basis elements are pairs of basis elements of F1 and F2.

Altogether, this allows to implement a domain F for the elements of a combinatorial algebra expanded on a given basis (e.g. the domain of symmetric functions expanded on the Schur functions, or the domain of symmetric functions expanded on the power-sum functions). The product of two elements of F is automatically expanded on the basis, and belongs again to F. Such a domain belongs to the category Cat::AlgebraWithBasis (later on, there will be categories such as Cat::BialgebraWithBasis, Cat::HopfAlgebraWithBasis).

Representing combinatorial algebras with several bases

A combinatorial algebra with several natural bases will be represented by several domains. Converting from one basis to another is an essential operation, and it is strongly desirable to be able to mix in the same expression elements that are expressed in different basis. This can now be achieved through overloading and the definition of appropriate implicit conversions (see the demonstration).

Developing the underlying tools that allow to do this seamlessly required a fair amount of work. Indeed, one parameter overloading in MuPAD works fine by delegating the work to the corresponding method of the parameter; on the other hand, the plain system essentially does not help much for multi-parameter overloading, which has to be carefully done by hand in each and every library (think about computing a product where the operands are in turn a symmetric function on the p basis, an integer, a symmetric function on the e basis, and a rational number). So we had to specify and implement a new mechanism that takes care of implicit conversions, and multi-parameters overloading. We essentially mimicked ideas taken from the GAP system [GAP4], as well as from the static overloading mechanisms of standard languages like C++. The main difficulty was to choose the right level of generality, so as to make the system powerful enough, yet simple, safe, and sound. Our choices were largely influenced by our practical experience with the kind of computations we have in mind. This new overloading mechanism is being discussed for integration and systematic use in the MuPAD library. We are not at all specialists of this sensible subject, so comments and suggestions about this are very welcome. For details, see:

http://mupad-combinat.sf.net/doc/html/operators/overloaded.html

This mechanism is currently pretty rudimentary and limited. However, we have been playing with it intensively, and have done some really hairy stuff based on it. The semantic has proven so far to be safe, sound and robust, while really much more practical and powerful than the plain overloading mechanism. The time overhead is reasonable; if it became an issue, the specifications leave quite some room for optimisations, in particular by inclusion in the MuPAD kernel.

When defining combinatorial algebras with several bases, we organize the code in a fairly standardized way, which takes care of various technical issues such as initialization or parameterization of the algebra by the coefficient ring. We urge the interested reader to check out the code in the examples library, and in particular:

http://mupad-combinat.sf.net/lib/EXAMPLES/SymmetricFunctions.mu

Conversions to and from expressions

Having strong safeguards is essential so that a beginner can run computations with confidence. However, one of our motto is that the system should not try to be too clever, and in particular should always leave a way for the user to take over the control (and the responsibilities!). Indeed, there always are situations where the user knows that a given operation, invalid in general, happens to be perfectly legal in the current context. Most of the time, this can be taken care of by systematically providing conversions to and from symbolic expressions that the user may manipulate at his convenience. However, there is no clearly-defined way yet for how to represent elements of non-commutative algebras using symbolic expressions; indeed MuPAD (as most other systems) assumes that the latter are commutative. Just to give a flavor of the issue: which commutation rules should be applied automatically by the system in the expression 2 * e[1] * q * 3 * f[3] ? Suggestions are very welcome here.

Compact notations

Throughout the tutorial, we have used fairly lengthy notations for constructing elements of combinatorial algebras. For example, to define the first symmetric power-sum, we wrote S::p([1]). This is fine in a tutorial when safety is at a premium, but in everyday's use, having terse notations is highly desirable. We are currently experimenting several tricks that allow for simultaneously using p to represent the domain of symmetric functions in the p basis, and p[1] for creating the first symmetric power-sum, while still being able to convert symmetric functions into symbolic expressions containing literals such as p[1]. As soon as we will have more experience with this, we will describe the recommended practice in the Tips and Tricks section of the reference manual.

[Previous] [Next] [Contents]


MuPAD Combinat, an open source algebraic combinatorics package