Opened 3 years ago
Closed 3 years ago
#26111 closed enhancement (fixed)
Implement faster iterator for Lyndon words
Reported by:  tscrim  Owned by:  

Priority:  major  Milestone:  sage8.4 
Component:  combinatorics  Keywords:  
Cc:  sagecombinat, mantepse, vdelecroix  Merged in:  
Authors:  Travis Scrimshaw  Reviewers:  Martin Rubey 
Report Upstream:  N/A  Work issues:  
Branch:  0b4c68b (Commits, GitHub, GitLab)  Commit:  0b4c68b2ba3c7e270ac8d4137c137695132fc588 
Dependencies:  #25462  Stopgaps: 
Description
Having an iterator for Lyndon words with a fixed length and alphabet size is useful that does not return elements (e.g., the free Lie algebra). Moreover, by having less transient objects, we can also improve the speed.
Current branch:
sage: LW = LyndonWords(11,6) sage: %time for x in LW: pass CPU times: user 2.17 s, sys: 40 ms, total: 2.21 s Wall time: 2.17 s sage: LW = LyndonWords(11,7) sage: %time for x in LW: pass CPU times: user 20.7 s, sys: 20 ms, total: 20.7 s Wall time: 20.7 s
vs old version:
sage: LW = LyndonWords(11,6) sage: %time for x in LW: pass CPU times: user 2.86 s, sys: 12 ms, total: 2.87 s Wall time: 2.85 s sage: LW = LyndonWords(11,7) sage: %time for x in LW: pass CPU times: user 26.8 s, sys: 12 ms, total: 26.8 s Wall time: 26.7 s
Change History (28)
comment:1 Changed 3 years ago by
 Branch set to public/combinat/fast_lyndon_iter26111
 Commit set to a17239b054dbed216feba1b1b03b50f51df196eb
 Status changed from new to needs_review
comment:2 followup: ↓ 5 Changed 3 years ago by
Given #25262, it would be good to add a "long" doctest, so we can avoid performance regressions in future.
comment:3 Changed 3 years ago by
Notes to myself:
There seem to be Gray codes available, eg:
(Note that the very simple iterative algorithm on page 217 of Ruskey's book visits also the necklaces.)
comment:4 Changed 3 years ago by
 Commit changed from a17239b054dbed216feba1b1b03b50f51df196eb to bbacc76e3cd1da57b7463069c49b4c1d38ae9f21
Branch pushed to git repo; I updated commit sha1. New commits:
bbacc76  Added long test to lyndon_word.py.

comment:5 in reply to: ↑ 2 Changed 3 years ago by
Replying to mantepse:
Given #25262, it would be good to add a "long" doctest, so we can avoid performance regressions in future.
That is not really how #25262 is suppose to work, and I don't believe is a better way to catch regressions (there will be less noise, but the noise will still be there). However, I've added it.
comment:6 Changed 3 years ago by
Here is a naive implementation of the iterative FKM algorithm from Ruskey's book. I couldn't get http://puma.dimai.unifi.it/17_1_2/weston.pdf to work, and I haven't checked what it's advantage would be.
I won't steel the joy of timing this from you :)
def FKM_iterativ(k, n): a = [0]*(n+1) if n == 1: yield a[:] i = n; while True: a[i] += 1 for j in range(1, ni+1): a[j + i] = a[j] if n == i: yield a[:] i = n while a[i] == k1: i = 1 if i == 0: break
comment:7 Changed 3 years ago by
 Commit changed from bbacc76e3cd1da57b7463069c49b4c1d38ae9f21 to 8cc9d0e9cda0a33477caf88a93be69d48bff600e
Branch pushed to git repo; I updated commit sha1. Last 10 new commits:
ce2962b  restore richer doc tests

ba08ff3  reviewer's comments

95ce20f  provide iterators which return lists of lists

d2e0e6e  inline a computation by reviewer's request

947233c  Merge branch 'public/25462' of git://trac.sagemath.org/sage into public/combinat/speedup_set_partitions25462

a79e302  Reverted to an_element() and added some additional reviewer changes.

87bf535  Cythonizing iterator.

ba6a115  Fixing doctests due to ordering change.

95d5d91  Merge branch 'public/combinat/speedup_set_partitions25462' of git://trac.sagemath.org/sage into public/combinat/fast_lyndon_iter26111

8cc9d0e  Using a Cython version of FKM algorithm to generate Lyndon words.

comment:8 Changed 3 years ago by
 Dependencies set to #25462
The "joy" of timing...sure...
Anyways...that was a good find. The FKM algorithm is much faster than what was previously there (approximately 4x faster):
sage: %time for l in generate_lyndon_words(11,6): pass CPU times: user 1.74 s, sys: 36 ms, total: 1.78 s Wall time: 1.76 s sage: %time for l in FKM_iterative(11,6): pass CPU times: user 452 ms, sys: 32 ms, total: 484 ms Wall time: 432 ms sage: %time for l in generate_lyndon_words(11,7): pass CPU times: user 16.7 s, sys: 28 ms, total: 16.7 s Wall time: 16.7 s sage: %time for l in FKM_iterative(11,7): pass CPU times: user 3.75 s, sys: 28 ms, total: 3.78 s Wall time: 3.74 s
With Cython (approximate another 10x speedup):
sage: %time for l in FKM_iterative_cython(11,6): pass CPU times: user 36 ms, sys: 20 ms, total: 56 ms Wall time: 40.7 ms sage: %time for l in FKM_iterative_cython(11,7): pass CPU times: user 340 ms, sys: 12 ms, total: 352 ms Wall time: 322 ms
I have merged in #25462 and added the Cython code in such a way as to not (add) conflict with #25659.
comment:9 Changed 3 years ago by
Great! (sorry, I actually did think that you would enjoy the timing)
One thing I do not yet understand: on my laptop, FKM is roughly 12 times faster than the original iterator (i.e., without this ticket).
I'll try this ticket now to check.
comment:10 Changed 3 years ago by
Dear Travis!
The following cuts the time by a half, but I'm not sure whether it's correct... What do you think? (more precisely: should it really be FiniteWord_char
?)

src/sage/combinat/lyndon_word.py
diff git a/src/sage/combinat/lyndon_word.py b/src/sage/combinat/lyndon_word.py index 880dd4139c..15d0f56d41 100644
a b class LyndonWords_nk(UniqueRepresentation, Parent): 457 457 sage: sum(1 for lw in LyndonWords(11, 6)) # long time 458 458 295020 459 459 """ 460 from sage.combinat.words.word import FiniteWord_char 460 461 for lw in generate_lyndon_words(self._n, self._k): 461 yield self._words([i+1 for i in lw], check=False)462 yield FiniteWord_char(self._words, [i+1 for i in lw]) 462 463 463 464 def StandardBracketedLyndonWords(n, k): 464 465 """
comment:11 followup: ↓ 12 Changed 3 years ago by
To be nicer with the words code use
self._words.element_classes['char'](self._words, [i+1 for i in lw])
That should roughly be the same. And you can even keep a reference W = self._words.element_classes['char']
to save extra micro seconds.
comment:12 in reply to: ↑ 11 Changed 3 years ago by
Thank you Vincent!
However, it turns out that "char" is probably incorrect:
sage: list(LyndonWords(1000,1)) OverflowError: value too large to convert to unsigned char
So it should be

src/sage/combinat/lyndon_word.py
diff git a/src/sage/combinat/lyndon_word.py b/src/sage/combinat/lyndon_word.py index 880dd4139c..8c0f49cfdd 100644
a b class LyndonWords_nk(UniqueRepresentation, Parent): 457 457 sage: sum(1 for lw in LyndonWords(11, 6)) # long time 458 458 295020 459 459 """ 460 W = self._words._element_classes['list'] 460 461 for lw in generate_lyndon_words(self._n, self._k): 461 yield self._words([i+1 for i in lw], check=False)462 yield W(self._words, [i+1 for i in lw]) 462 463 463 464 def StandardBracketedLyndonWords(n, k): 464 465 """
This doesn't seem to be essentially slower than "char"...
comment:13 Changed 3 years ago by
@Travis, a tiny thing:

src/sage/algebras/lie_algebras/free_lie_algebra.py
diff git a/src/sage/algebras/lie_algebras/free_lie_algebra.py b/src/sage/algebras/lie_algebras/free_lie_algebra.py index 1d0665d225..49fcf058c9 100644
a b class FreeLieAlgebra(Parent, UniqueRepresentation): 809 809 n = len(self._indices) 810 810 ret = [] 811 811 for lw in generate_lyndon_words(n, k): 812 b = self._standard_bracket(tuple( [names[i] for i in lw]))812 b = self._standard_bracket(tuple(names[i] for i in lw)) 813 813 ret.append(self.element_class(self, {b: one})) 814 814 return tuple(ret)
comment:14 Changed 3 years ago by
Another one:

src/sage/algebras/lie_algebras/free_lie_algebra.py
diff git a/src/sage/algebras/lie_algebras/free_lie_algebra.py b/src/sage/algebras/lie_algebras/free_lie_algebra.py index 1d0665d225..1ef05a7f4f 100644
a b class FreeLieAlgebra(Parent, UniqueRepresentation): 774 774 [x0, [[x0, x1], x1]], 775 775 [x0, [x0, [x1, x2]]], 776 776 [x0, [[x0, x2], x1]], 777 [[x0, x1], [x0, x2]],778 777 [x0, [[x0, x2], x2]], 778 [[x0, x1], [x0, x2]], 779 779 [[[x0, x1], x1], x1], 780 780 [x0, [x1, [x1, x2]]], 781 781 [[x0, [x1, x2]], x1], 782 [[[x0, x2], x1], x1],783 782 [x0, [[x1, x2], x2]], 783 [[[x0, x2], x1], x1], 784 784 [[x0, x2], [x1, x2]], 785 785 [[[x0, x2], x2], x1], 786 786 [[[x0, x2], x2], x2],
comment:15 Changed 3 years ago by
 Reviewers set to Martin Rubey
comment:16 Changed 3 years ago by
 Commit changed from 8cc9d0e9cda0a33477caf88a93be69d48bff600e to 7b25ddc646833c95d3d92c0fec399e3df6a7370f
Branch pushed to git repo; I updated commit sha1. New commits:
7b25ddc  Using faster generator for LyndonWords and fixing doctest in free_lie_algebra.py.

comment:17 Changed 3 years ago by
The change in comment:13 is actually slower in both Python and Cython. It has something to do with how Python constructs these things internally (but I agree that it is counterintuitive).
With comment:12:
sage: LW = LyndonWords(11,6) sage: %time for x in LW: pass CPU times: user 200 ms, sys: 21.3 ms, total: 222 ms Wall time: 201 ms sage: LW = LyndonWords(11,7) sage: %time for x in LW: pass CPU times: user 1.85 s, sys: 8.12 ms, total: 1.85 s Wall time: 1.84 s
vs old
sage: LW = LyndonWords(11,6) sage: %time for x in LW: pass CPU times: user 475 ms, sys: 42.2 ms, total: 518 ms Wall time: 465 ms sage: LW = LyndonWords(11,7) sage: %time for x in LW: pass CPU times: user 4.25 s, sys: 17.8 ms, total: 4.27 s Wall time: 4.24 s
comment:18 Changed 3 years ago by
One more doctest, a pyflakes thing, and a corner case:
sum(1 for lw in LyndonWords(1, 1000))
is stupid, but (with this patch) it kills my computer by filling up the memory really quickly. Without this patch, it hits the recursion limit, whereas it gives 0
(which is correct) up to and including length 988.

src/sage/combinat/combinat_cython.pyx
diff git a/src/sage/combinat/combinat_cython.pyx b/src/sage/combinat/combinat_cython.pyx index 1a53431a5a..227e5d8900 100644
a b def generate_lyndon_words(Py_ssize_t n, Py_ssize_t k): 168 168 169 169 sage: list(generate_lyndon_words(5, 0)) 170 170 [] 171 172 sage: list(generate_lyndon_words(1, 1000)) 173 [] 171 174 """ 172 175 cdef Py_ssize_t i, j 173 176 if k == 0: … … def generate_lyndon_words(Py_ssize_t n, Py_ssize_t k): 176 179 for i in range(n): 177 180 yield [i] 178 181 return 182 if n == 1: 183 return 179 184 180 185 cdef list a = [0] * (k+1) 181 186 i = k 
src/sage/combinat/lyndon_word.py
diff git a/src/sage/combinat/lyndon_word.py b/src/sage/combinat/lyndon_word.py index 8c0f49cfdd..306f7a3cf9 100644
a b from sage.arith.all import factorial, divisors, gcd, moebius 23 23 from sage.misc.all import prod 24 24 25 25 from . import necklace 26 from sage.combinat.integer_vector import IntegerVectors, integer_vectors_nk_fast_iter27 26 from sage.combinat.words.words import FiniteWords 28 27 from sage.combinat.combinat_cython import generate_lyndon_words 29 28 … … class LyndonWords_nk(UniqueRepresentation, Parent): 456 455 457 456 sage: sum(1 for lw in LyndonWords(11, 6)) # long time 458 457 295020 458 459 sage: sum(1 for lw in LyndonWords(1000, 1)) 460 1000 461 462 sage: sum(1 for lw in LyndonWords(1, 1000)) 463 0 459 464 """ 460 465 W = self._words._element_classes['list'] 461 466 for lw in generate_lyndon_words(self._n, self._k):
comment:19 Changed 3 years ago by
 Commit changed from 7b25ddc646833c95d3d92c0fec399e3df6a7370f to b9d8ee495e9425e34c56361b694899f6aad226da
Branch pushed to git repo; I updated commit sha1. New commits:
b9d8ee4  Corner case, pyflakes, and some extra doctests.

comment:20 Changed 3 years ago by
Fixed and added. I also added more doctests for the other corner case of LyndonWords(1,1)
.
comment:21 Changed 3 years ago by
Good catch! Funny that the # long
disappeared :)
I have a few hopefully final questions/requests:
 since it is very likely that more and more fundamental iterators are cythonized in a similar way (eg. #25864 would probably make a lot of sense), I think it would be good to create and stick to a good naming scheme. It might be a pain to do that later. In short, I think
lyndon_word_iterator
would be better.
 there is a typo in the docstring of the iterator, I propose the following instead:

src/sage/combinat/combinat_cython.pyx
diff git a/src/sage/combinat/combinat_cython.pyx b/src/sage/combinat/combinat_cython.pyx index a3e98d3f94..84e707ce7a 100644
a b def _stirling_number2(n, k): 144 144 145 145 def generate_lyndon_words(Py_ssize_t n, Py_ssize_t k): 146 146 """ 147 Generate all Lyndon words with of length ``k`` with ``n`` letters.147 Generate the Lyndon words of fixed length `k` over the alphabet `{0, 1, ..., n1}`. 148 148 149 149 The resulting Lyndon words will be words represented as lists 150 150 whose alphabet is ``range(n)``. 151 151 152 152 ALGORITHM: 153 153 154 The FKM algorithmfrom *Combinatorial Generation* by Ruskey.154 The iterative FKM Algorithm 7.2 from *Combinatorial Generation* by Ruskey. 155 155 156 156 EXAMPLES::

 in the documentation of the module
lyndon_word.py
,k
andn
are swapped in the docstringsclass LyndonWords_nk
and__init__
just thereafter. Should this be fixed in a separate ticket  I think it's easier to do it now.
Otherwise, I think it is ready to go.
comment:22 Changed 3 years ago by
You can make and push changes too. :) I am going to bed now, but can do it in the morning.
comment:23 Changed 3 years ago by
 Commit changed from b9d8ee495e9425e34c56361b694899f6aad226da to 2b3f9d4ae0ef57562e87109ce08db3355dee1795
Branch pushed to git repo; I updated commit sha1. New commits:
2b3f9d4  fix doc and rename iterator

comment:24 Changed 3 years ago by
 Commit changed from 2b3f9d4ae0ef57562e87109ce08db3355dee1795 to 0b4c68b2ba3c7e270ac8d4137c137695132fc588
Branch pushed to git repo; I updated commit sha1. New commits:
0b4c68b  A few more doc tweaks and making Ruskey a proper reference.

comment:25 Changed 3 years ago by
Thank you. I made a few other small changes: I make Ruskey into an actual reference and the 1line description of lyndon_word_iterator
just use n
as code because the alphabet description is given in the main body (and I prefer to use code format as much as possible in the 1line descriptions for methods).
If my changes are good, then positive review.
comment:26 Changed 3 years ago by
 Status changed from needs_review to positive_review
comment:27 Changed 3 years ago by
Thank you.
comment:28 Changed 3 years ago by
 Branch changed from public/combinat/fast_lyndon_iter26111 to 0b4c68b2ba3c7e270ac8d4137c137695132fc588
 Resolution set to fixed
 Status changed from positive_review to closed
New commits:
Implementing "fast" iterator for Lyndon words of a fixed length and alphabet size.
Using new iterator in the free Lie algebra.