The choice of the platform has been a very difficult and long discussed question. We try to present here some arguments to motivate our choice. These arguments are more or less objective, and depending on their research field or computing habits, different people would certainly choose a different order for the relative importance of these arguments. So we do not claim to have any kinds of scientific rigor here. We had to make a choice at some point !
MuPAD is to some extends a object oriented language. We have already shown why this is an important advantage through the demonstration, and this will be extensively discussed later, so we don't discuss it here more.
The main goal is to do research and to share the code. So we have to choose a widely used and well known computed algebra system. We do not have made a serious statistic but is seams that the two main systems used in the algebraic combinatoric community (or at least the community of people frequenting the lotharingian seminar) are the commercial system Maple and . Our experience with Maple in Marne-La-Vallée is bad and seems not to fits our technical requirement concerning object oriented feature and overloading.
We have also though about two other universitary platform. and . The fact that the team keep their algorithms closed is a major obstacle in our view. One the contrary is totally open source. The main disadvantage with is that they do not provide the full basic mathematical toolbox we need in our exploration (polynomial and expressions computations, ...) However, we strongly thought about closely interfacing MuPAD with and some positive small proof of concept have already been done.
MuPAD is not open-source nor free. This is in our minds a strong drawback. We write code for research and we think that the code is a way to publish the work. One of main goal of MuPAD-combinat is to allows researchers to easily share their code. So we cannot imagine to write a closed code that belongs to some people and that cannot be modified or reused is any ways. That's why we choose the Gnu Lesser Public License for distributing our code. So the fact that MuPAD is not open source is a strong argument against its use. However, they promise that MuPAD will be always free for personal use in public research institutions. Moreover they promise to publish the code source of the library under a well known open-source license, as soon as they have found a convenient one together with some legality confirmation by a lawyer. For the moment, we are still awaiting...
A posteriori, one of the most important argument is the fact that we have a very good cooperation together with me MuPAD development team. First of all, at least concerning the question of what has to be done, we have the full responsibility of the combinatorial library of MuPAD. This has some strong disadvantages such as deadline, bug report... But this also means that the MuPAD team is distributing our code. Thus they always keep one eyes on what we are doing. Note that the test-suite of the whole MuPAD project is launched every night and if something is broken in our part, we knows it immediately. We do not pretend that the MuPAD team helps us in maintaining our code but we have a very strong support. To try to have an objective number, in this year (2003) we have something like 500 messages on the mailing list mupad-combinat-devel. Nearly 200 are from the MuPAD team.
The convention for library, domain and function names is to use long names that are as meaningful and close to the English spelling as possible: e.g. partitions instead of PART, FreeQuasiSymmetricFunctions instead of FQSym, etc. In particular, abbreviations should be avoided, except in extreme cases where the short name is really well established (say Lex instead of Lexicographic). Here are the motivations for this convention:
BT
for
combinat::binaryTrees
, Perm
for
combinat::permutations
, LRA
for the Loday-Ronco Algebra,
and so on.
We follow the capitalization rules of the MuPAD coding standard, which are quite similar to Java or C++ rules:
fromReducedWord
.
Using underscores to separate parts (as in from_reduced_word
)
is not recommended; some names in our code do not follow this
recommendation yet.
MinLength
, R
).
combinat::partitions::type
,
combinat::permutations::fromReducedWord
,
Dom::Matrix::transpose
).
DOM
, TEXTWIDTH
).
combinat
,
combinat::generators
, and is capitalized when it is a true
domain which contains elements (Series
, Dom::Rational
,
examples::SymmetricFunctions
). The later case includes for
example all the domains in Dom
, examples
,
experimental
. On the other hand, the names of combinatorial
classes in combinat
are not capitalized
(combinat::partitions
). Other exceptions to this rule
typically appear when the name of a library comes from initials
(IPC
) or from a person name.
This lack of coherency is a burden for both users and developers, and we hope to fix it at some point in the future, when the MuPAD library will undergo a similar naming convention cleanup.
When the name of a library, domain or function is composed of several
parts, and those parts are also used in other names, it may be
worthwhile to order those parts from the most general to the most
specific. For example, we used the name
combinat::integerVectorsWeighted
instead of the more natural
name combinat::weightedIntegerVectors
. The advantage is that all
the domains dealing with integer vectors start with the same prefix,
which is particularly practical with respect to automatic completion.
This is also coherent with the hierarchy
library::sublibrary::subsublibrary
. Another typical case is when
several functions return a similar result but under different forms;
the function that is the most useful or natural gets the short name,
and the names of the other functions, are suffixed with the "type" of
the result (words::inversions
/words::inversionsList
, ...).
We also used this rule of thumb for the names of the free module
methods (mult
/multBasis
/mult2
/mult2Basis
,
straighten
/straightenBasis
,
print
/printTerm
/printMonomial
/printBasis
).
The notion of combinatorial object is best described by some examples: a partition, a binary tree, a permutation, a graph, a Feynman diagram, a Dyck word, and other similar discrete objects are all combinatorial objects.
A combinatorial class is a (countable) set of related combinatorial objects (e.g. the set of all partitions, the set of all binary trees, the set of all standard permutations), on which a size function is defined (e.g. the size of partition is the sum n of its parts; the size of an integer vector consist of a pair (n,k) of integers: its sum n and its length k; the size of a tree is the number of its nodes). Typically, the fibers of the size function define natural finite subsets of the class that one wants to count, enumerate, and so on (e.g. counting all the partitions of n=4, listing all the integer vectors of sum 5 and length 3); we say ``typically'', because there are some cases where it is practical to use this framework even if the subsets are only countable. In many cases optional restrictions can be added to define smaller subsets of the class to be counted/enumerated/..., (e.g. the partitions of 4 of length at most 4). In some combinatorial classes (e.g. the class of the permutations of 5), the size function may be degenerated and have only one non trivial fiber.
A combinatorial object may belong to several combinatorial classes
simultaneously. For example, the list [4,3,2,1]
may be
interpreted as a partition, a permutation, an integer vector, a
composition. This is reflected in MuPAD by our convention that a
combinatorial object is not necessarily strongly typed by the
combinatorial class(es) to which it belongs. An object has a unique
domain: it corresponds to the data structure of the object and can be
obtained by the command domtype
. On the other hand, it may be of
different types: they correspond to the different semantics that can
be attached to the object, and they can be tested with testtype
.
For example, [3,4,2,1]
belongs to the MuPAD domain of lists,
DOM_LIST
; it is simultaneously a list of positive integers
(Type::ListOf(Type::PosInt)
), a word, a permutation, etc, while
it is not a partition:
>> domtype ([3, 4, 2, 1]);
DOM_LIST
>> testtype([3, 4, 2, 1], Type::ListOf(Type::PosInt)),
testtype([3, 4, 2, 1], combinat::words),
testtype([3, 4, 2, 1], combinat::compositions),
testtype([3, 4, 2, 1], combinat::integerVectors),
testtype([3, 4, 2, 1], combinat::permutations),
testtype([3, 4, 2, 1], combinat::partitions)
TRUE, TRUE, TRUE, TRUE, TRUE, FALSEIn the MuPAD terminology, the domains like
combinat::compositions
are called facade domains; they do
not really contain elements of their own.
On the other hand, some combinatorial objects, such as trees, require a specific data structure; these objects are strongly typed, that is their domain is the class itself. This has, among others, the advantage that they are pretty printed by the system:
>> t := combinat::binaryTrees([1, [1], [1]]);
domtype(t);
testtype(t, combinat::binaryTrees);
o / \ combinat::binaryTrees TRUE
This choice of not systematically using strong typing for combinatorial classes is not an obvious one, and there is no clear cut criteria for deciding whether a given combinatorial class should use strong typing or not. On the one hand, strong typing allows for object-oriented techniques and overloading. On the other hand, having to cope with all the conversions (a partition is also a composition, ...) can be quite a burden for both the user and the programmer; indeed, choosing which implicit conversions to provide is not an easy task, given the overwhelming number of natural bijections. Finally, with the current MuPAD language, there is a small memory and time overhead with strong typing; this can be considered as a misfeature of MuPAD though.
Aside from the data structure criteria, another reasonable criteria is
whether the combinatorial class has natural ``algebraic operations''.
This is why we currently have both a facade domain
combinat::permutations
for general permutations seen as words,
and a real domain Dom::SymmetricGroup
for standard permutations
seen as elements of the symmetric group. Actually, it could be
reasonable to have a real domain Dom::Permutation
, and have
Dom::SymmetricGroup
and Dom::PermutationGroup
be facade
domains for Dom::Permutation
. This is still under discussion,
and comments are welcome.
Combinatorial classes for which we want to do counting, generation, or
other manipulations are represented by MuPAD domains, like
combinat::partitions
or combinat::binaryTrees
. Note that,
in many cases, those domains are just facade domains and do not
really contain elements: as we said above, the domain of a partition,
or of a permutation, is really DOM_LIST
. Those domains also
define a MuPAD type; by convention, it is named like
combinat::partitions::type
, and can be tested with:
>> testtype([3, 2, 2, 1], combinat::partitions)
TRUESimpler combinatorial classes, which we only want to use for type checking, are just represented by MuPAD types. This is typically used for subsets of other combinatorial classes. For example, the standard permutations form a subset of all permutations, and are represented by the type
combinat::permutations::typeStandard
.
We are not quite happy with this naming convention; however, for
better localization, we really would like to keep the types defining
a subset of a domain inside this domain. Another option was to use
subdomains even in this case. But domains are quite special (and
expensive?) objects in MuPAD: they have a reference effect, they
cannot be deleted, etc. So, we feel that this would be overkill,
especially for parametrized types like
PermutationOf([a,b,c,d])
.
Another related situation: quite often, we have a function that
returns a collection C of related objects, usually as a list.
Think of combinat::words::shuffle([1,2,3],[a,b,c])
which
returns a list of words. Or think of the inverse of a function that
is not at all injective like
combinat::permutations::fromCycleType
(it returns all the
permutations having a given cycle type). In such cases, we often
want to do some more involved things, like having a generator for
the elements of C, or being able to count them without actually
generating them. Then, it is natural to consider C as a
combinatorial class, and to represent it by a MuPAD domain. This
gives a unified interface to all the standard functions for
counting, generating, ...:
combinat::words::shuffle::count(word1,word2)
combinat::words::shuffle::list(word1,word2)
combinat::words::shuffle::generator(word1,word2)
As a nice side effect, the standard alias from new
to list
allows to use the natural syntax
combinat::words::shuffle(word1,word2)
to obtain the collection
C. So, switching from a simple function which returns C to a
domain for the elements of C is transparent for the user. Usually,
such a domain will be a subdomain of an existing domain (here
combinat::words::shuffle
is a subdomain of
combinat::words
).
A domain which represents a combinatorial class belongs to the
category Cat::CombinatorialClass
. Such a domain should
implement at least generator
or list
. This category also
provides naming conventions for usual functions like count
,
first
, next
, random
, unrank
, ... The
implementation of those functions is not explicitly required by the
category Cat::CombinatorialClass
: depending on the specific
combinatorial class sometimes they are not yet implemented,
sometimes there exists no efficient algorithm, or sometimes they
simply do not really make sense.
In the future, we may think about refining
Cat::CombinatorialClass
into subcategories that describe which
of those functions are available (for example, the category of
combinatorial class which provide an unrank
function). So far,
the benefits coming from such refinements do not seem to justify the
overhead in the complexity of the category hierarchy.
Also, all of this is not really specific to combinatorial classes. We
could imagine generalizing this to any kinds of collections of
objects, and mimic the category hierarchy of, e.g. Aldor.
By convention, all the subcategories of
Cat::CombinatorialClass
have a name of the form
Cat::XxxClass
. Right now, we have two sub categories of
Cat::CombinatorialClass
:
combinat::decomposableObjects
and
combinat::integerListsLexTools
, and allow to factor out some
routine code. For example, combinat::partitions
,
combinat::integerVectors
, and combinat::compositions
are
in the category Cat::integerListsLexClass
, which takes care of
the parsing of common options.
The names of the domains combinat::integerListsLexTools
and
combinat::decomposableObjects
are quite different. This
reflects the fact that those two domains do not play the same role.
combinat::decomposableObjects
is a parametrized domain whose
instantiations represent combinatorial classes, whereas
combinat::integerListsLexTools
essentially is a collection of
tools with a scarce interface, geared toward speed and internal use.
The name Cat::integerListsLex
is too general, since this
category contains only the combinatorial classes described using
length, bound, and slope constraints. For example, the elements of
combinat::permutations
are integer lists and are naturally
ordered lexicographically; however this combinatorial class is
not in Cat::integerListsLex
. Badly enough, we haven't
found a better name that would not be too long. Unless someone comes
up with a clever suggestion we will stick to this name.
We use Next
for the name of the method that computes the next
element in a combinatorial class. This is not coherent with
first
, last
and with the general convention that a method
name start by a lowercase letter. Badly enough, next
is a
reserved keyword, and we cannot use it as method name in MuPAD.
Let us start by a precise definition for the term combinatorial algebra that we have used so far in a rather informal way.
Given a combinatorial class C, and a ring R, one can define the free module F with basis indexed by C over the ring R; an element of F is a formal finite linear combination of elements of C with coefficients in R. Alternatively, an element of F can be interpreted as a function from C to R with finite support; that is a function which is zero except on finitely many elements of the basis C.
For example, here is an element of the free module with basis indexed by partitions, over Q: x:=4 [3,2,1] + 3 [2,1,1] + 1/4 [1,1,1,1] .
Polynomials are another typical example of free modules, and we extend the usual definitions used for polynomials. The coefficient of [1,1,1,1] in x is 1/4; we call the partition [3,2,1], seen as an element of F, a term; 4 [3,2,1] is a monomial; finally, the support of x is the set of the partitions with non-zero coefficients, that is {[3,2,1], [2,1,1],[1,1,1,1]}.
By combinatorial algebra we mean such a free module, together with some extra algebraic operations (a product, coproduct, antipod) which makes it an algebra, a bialgebra, or a Hopf algebra. Those operations are typically defined by linearity on the basis.
With this definition, we have distinguished a special basis of the combinatorial algebra. Most of the time, a combinatorial algebra (like the algebra of symmetric functions) will actually have several interesting basis (Schur functions, power-sum functions, ...), all of them indexed by C. The underlying free module remains the same, but the operations will vary accordingly. Changing from one basis to the other is one of the fundamental operations.
Traditionally in computer algebra systems (say with SF, ACE or mu-EC), symmetric functions have been represented by symbolic expressions:
>> p[2]*p[1];
muEC::SYMF::Top(p[2]*p[1]);
muEC::SYMF::Tos(p[2]*p[1])
p[1] p[2] p[2, 1] s[3] - s[1, 1, 1]This is also the approach used for polynomials, and more generally for Ore-algebras in Maple. This has several advantages:
factor
, expand
, simplify
, ...) are instantly
available;
However, this approach has also serious drawbacks:
e
). In particular, one cannot mix two
algebras that use the same basis name in the same expression.
Altogether, this approach is fine when there are very few combinatorial algebras; however it does not scale to a dozen predefined algebras (that's our short-term goal) plus myriads of user-defined algebras. In practice, this is one of the main reasons why the ACE project stalled when defining many new algebras became a must.
It was time for a complete redesign and rewrite of the package. We will see that using strong typing allows for circumventing all those drawbacks, without loosing too much of the advantages. The choice of switching to MuPAD was largely influenced by the strong integration of their domains/axioms/category mechanism in their system, that allows for strong typing, and object-oriented techniques.
The first thing to do is to choose the internal data structure to store an element of a free module. There are different possible implementations without a clear cut advantage of one over the others (as a parallel, in C++ there exists two implementations of association tables: map using sorted lists and hash_map using hash tables). We provide several of them:
Dom::FreeModuleTable(R, Basis)
An element x
is stored as an association table
(DOM_TABLE
). For example, here is the internal representation of
an element x
:
>> F := Dom::FreeModuleTable(Dom::Rational, combinat::partitions):
x := 4*F([3, 2, 1]) + 3*F([2, 1, 1]) + 1/4*F([1, 1, 1, 1]);
extop(x)
4 B([3, 2, 1]) + 3 B([2, 1, 1]) + 1/4 B([1, 1, 1, 1]) table( [1, 1, 1, 1] = 1/4, [2, 1, 1] = 3, [3, 2, 1] = 4 )Since any MuPAD object can be used as index of a table, there is no restriction on the basis elements. Accessing the coefficient of a term is constant time.
Dom::FreeModulePoly(R, Basis)
The kernel polynomial objects of MuPAD (domain DOM_POLY
) are
stored in a sparse non-recursive way using sorted lists of monomials
with a fixed number of variables. If one forgets about the product,
this provides a sparse data structure which is both compact in memory
and very fast for linear operations. Typically, the MuPAD sparse
matrices (domain Dom::SparseMatrix
) use univariate polynomials
internally as internal representation for sparse column vectors.
Similarly, an element x
of Dom::FreeModulePoly(R,
Basis)
is stored using a polynomial:
>> (F := Dom::FreeModulePoly(Dom::Rational, combinat::partitions);
x := 4*F([3, 2, 1]) + 3*F([2, 1, 1]) + 1/4*F([1, 1, 1, 1]));
extop(x)
1/4 B([1, 1, 1, 1]) + 3 B([2, 1, 1]) + 4 B([3, 2, 1]) 3 2 poly(1/4 _X + 3 _X + 4 _X, [_X])To achieve this, one needs to be able to represent an element of the basis using an exponent vector (an integer, or a fixed-length list of integers). This is trivial when the basis readily consists of fixed length lists of integers (integer vectors, standard permutations of a given n, ...). Otherwise, the user may provide a pair of functions
rank
and unrank
that does the conversions. By default, the
system creates a dummy pair of such functions: the rank
function
associate in turn the numbers 1,2,3,...
to each new object
it encounters. For example, the rank of [3,2,1]
, [2,1,1]
,
and [1,1,1,1]
above are respectively 1
,2
, and
3
, that corresponds to the order in which the corresponding
elements of F
have been created.
This representation is very fast for linear operations. Furthermore,
if univariate polynomials are used with the variable _X
(this is
the default), the data structure coincides exactly with the one used
by Dom::SparseMatrix
. This allows for zero-cost conversions to
and from sparse vectors, for doing linear algebra.
Accessing the coefficient of a term in an element x
is
logarithmic in the number of terms of x
(in the multivariate
case, with MuPAD < 3.0.0 this is linear instead of logarithmic).
Ranking and unranking is only done for conversions, and computing products. In practice, the overhead with the dummy implementation seems to be negligible, and largely compensated by the fact that most operations deal with integers.
Dom::FreeModuleList(R, Basis)
Thanks to Stefan Wehmeier, (Univ. Paderborn) and Werner M. Seiler
(Univ. Karlsruhe), this was mostly already implemented in the
MuPAD library under the name Dom::FreeModule
since 1997. An
element is represented by a sorted list of terms. For example, here is
the representation of the element x
above: [[4, [3, 2, 1]],
[3, [2, 1, 1]], [1/4, [1, 1, 1, 1]]]
.
Obviously, there needs to be an order on the basis elements
(Basis
should be a Cat::OrderedSet
). Accessing a leading
or trailing term is constant-time; accessing the coefficient of a
given term in an element x
is logarithmic in the number of terms
of x
.
All those implementations are in the category
Cat::ModuleWithBasis
which defines a unified interface. So, one
can change the underlying implementation at any time. Unless you have
a specific reason to choose one of the implementations, just use the
default implementation Dom::FreeModule(R, Basis)
.
Once the underlying linear structure is implemented, it is very easy to
construct functions on a free module by linearity / bilinearity /
multilinearity on the basis (see the examples) using the utilities
from operators
. So, implementing operations like products,
coproducts, antipods usually boils down to implement the underlying
combinatorics. Note that we do not have yet a general construction for
the tensor product of two free modules F1
and F2
, but you
can emulate one by hand by building a free module whose basis elements
are pairs of basis elements of F1
and F2
.
Altogether, this allows to implement a domain F
for the elements
of a combinatorial algebra expanded on a given basis (e.g. the domain
of symmetric functions expanded on the Schur functions, or the domain
of symmetric functions expanded on the power-sum functions). The
product of two elements of F
is automatically expanded on the
basis, and belongs again to F
. Such a domain belongs to the
category Cat::AlgebraWithBasis
(later on, there will be
categories such as Cat::BialgebraWithBasis
,
Cat::HopfAlgebraWithBasis
).
A combinatorial algebra with several natural bases will be represented by several domains. Converting from one basis to another is an essential operation, and it is strongly desirable to be able to mix in the same expression elements that are expressed in different basis. This can now be achieved through overloading and the definition of appropriate implicit conversions (see the demonstration).
Developing the underlying tools that allow to do this seamlessly required a fair amount of work. Indeed, one parameter overloading in MuPAD works fine by delegating the work to the corresponding method of the parameter; on the other hand, the plain system essentially does not help much for multi-parameter overloading, which has to be carefully done by hand in each and every library (think about computing a product where the operands are in turn a symmetric function on the p basis, an integer, a symmetric function on the e basis, and a rational number). So we had to specify and implement a new mechanism that takes care of implicit conversions, and multi-parameters overloading. We essentially mimicked ideas taken from the GAP system [GAP4], as well as from the static overloading mechanisms of standard languages like C++. The main difficulty was to choose the right level of generality, so as to make the system powerful enough, yet simple, safe, and sound. Our choices were largely influenced by our practical experience with the kind of computations we have in mind. This new overloading mechanism is being discussed for integration and systematic use in the MuPAD library. We are not at all specialists of this sensible subject, so comments and suggestions about this are very welcome. For details, see:
http://mupad-combinat.sf.net/doc/html/operators/overloaded.html
This mechanism is currently pretty rudimentary and limited. However, we have been playing with it intensively, and have done some really hairy stuff based on it. The semantic has proven so far to be safe, sound and robust, while really much more practical and powerful than the plain overloading mechanism. The time overhead is reasonable; if it became an issue, the specifications leave quite some room for optimisations, in particular by inclusion in the MuPAD kernel.
When defining combinatorial algebras with several bases, we organize the code in a fairly standardized way, which takes care of various technical issues such as initialization or parameterization of the algebra by the coefficient ring. We urge the interested reader to check out the code in the examples library, and in particular:
http://mupad-combinat.sf.net/lib/EXAMPLES/SymmetricFunctions.mu
Having strong safeguards is essential so that a beginner can run
computations with confidence. However, one of our motto is that the
system should not try to be too clever, and in particular should
always leave a way for the user to take over the control (and the
responsibilities!). Indeed, there always are situations where the
user knows that a given operation, invalid in general, happens to be
perfectly legal in the current context. Most of the time, this can be
taken care of by systematically providing conversions to and from
symbolic expressions that the user may manipulate at his convenience.
However, there is no clearly-defined way yet for how to represent
elements of non-commutative algebras using symbolic expressions;
indeed MuPAD (as most other systems) assumes that the latter are
commutative. Just to give a flavor of the issue: which commutation
rules should be applied automatically by the system in the expression
2 * e[1] * q * 3 * f[3]
? Suggestions are very welcome here.
Throughout the tutorial, we have used fairly lengthy notations for
constructing elements of combinatorial algebras. For example, to
define the first symmetric power-sum, we wrote S::p([1])
. This
is fine in a tutorial when safety is at a premium, but in everyday's
use, having terse notations is highly desirable. We are currently
experimenting several tricks that allow for simultaneously using
p
to represent the domain of symmetric functions in the p
basis, and p[1]
for creating the first symmetric power-sum,
while still being able to convert symmetric functions into symbolic
expressions containing literals such as p[1]
. As soon as we will
have more experience with this, we will describe the recommended
practice in the Tips and Tricks section of the reference manual.
MuPAD Combinat, an open source algebraic combinatorics package