UniFI logo

SKKU AORC 2017

abstract
A suite of tools to interact with the OEIS

In [6]:
__AUTHOR__ = ('Massimo Nocentini',
              'massimo.nocentini@unifi.it',
              'https://github.com/massimo-nocentini/')

__ADVISOR__ = ('prof. Donatella Merlini',
               'donatella.merlini@unifi.it')

__INSTITUTE__ = ('Dipartimento di Informatica, Statistica, Applicazioni',
                 'University of Florence, Italy')

talk_URL()
http://massimo-nocentini.github.io/PhD/skku-aorc-2017/oeistools.html#/

toc

  • intro & motivations
  • programs suite
    • crawler
    • pretty printer
    • grapher
  • console, API and Jupyter notebook interfaces
  • further works

intro

  • beware of shadows: I'm a computer scientist (a programmer, precisely)
  • pretty technical talk, no math today
  • source code at https://github.com/massimo-nocentini/oeis-tools
    • be free to download and explore
    • issues, enhancements and bug spot are welcome
    • contribute with pull requests according to the GitHub flow
  • programs tested on Unix machines (sorry Win users!)

motivations

  • necessity to search the OEIS offline
  • necessity to work in the console and with notebooks
  • desire to have an unified working env
    • use a general purpose prog lang, namely Python
    • integrate OEIS with a symbolic algebra system
    • automate searches

crawler

  • it fetches sequences recursively
    • union of cross refs sections defines the fringe to fetch
  • state of the art implementation
    • requires Python 3.6, namely the latest release
    • no threads, no race conditions, no data sync, ...
    • lies on async/await Python primitives only
    • only 300 lines of pure Python code
  • cache a portion of the OEIS to speed up repeated lookups
  • restart from the cache you have fetched already
    • it is addictive!! have your local collection of sequences
  • scalability by degree of parallelism, avoiding bottlenecks
    • too fast, dude! do not be banned by the OEIS crew ;)

help & usage

In [3]:
!python3.6 ../../src/crawling.py --help
usage: crawling.py [-h] [--clear-cache] [--restart] [--workers WORKERS]
                   [--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}]
                   [--cache-dir CACHE_DIR] [--progress-mark PROGRESS_MARK]
                   [S [S ...]]

OEIS Crawler.

positional arguments:
  S                     Sequence to fetch, given in the form Axxxxxx

optional arguments:
  -h, --help            show this help message and exit
  --clear-cache         Clear cache of sequences, according to --cache-dir
  --restart             Build fringe from cached sequences (defaults to False)
  --workers WORKERS     Degree of parallelism (defaults to 10)
  --log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}
                        Logger verbosity (defaults to ERROR)
  --cache-dir CACHE_DIR
                        Cache directory (defaults to ./fetched/)
  --progress-mark PROGRESS_MARK
                        Symbol for fetched event (defaults to ●)

fetching

command to download two sequences, the Fibonacci and Catalan numbers respectively:

In [9]:
!python3.6 ../../src/crawling.py A000045 A000108 \
    --cache-dir=../../src/fetched/
●●●●●●●●●●●^C

fetched 12 new sequences:
{'A000108', 'A002420', 'A003046', 'A000081', 'A167893', 'A033184', 'A032443', 'A038003', 'A001791', 'A120303', 'A000045', 'A014138'}

restarting

if the cache contains sequences, we restart from the set of their cross refs, recursively:

In [10]:
!python3.6 ../../src/crawling.py \
    --restart \
    --cache-dir=../../src/fetched/
●●●●●●●●●●●●●●●●●^C

fetched 18 new sequences:
{'A003519', 'A091867', 'A036355', 'A068231', 'A047072', 'A038575', 'A051575', 'A105317', 'A071679', 'A099731', 'A000744', 'A059365', 'A104597', 'A137697', 'A120274', 'A000169', 'A001699', 'A100257'}

cache summary & raw json

the following shows cache status and its content:

In [4]:
!python3.6 ../../src/crawling.py --cache-dir=../../src/fetched/
474 sequences in cache ../../src/fetched/
1837 sequences in fringe for restarting
In [17]:
!ls ../../src/fetched/
A000045.json A001699.json A014138.json A038575.json A071679.json A105317.json
A000081.json A001791.json A032443.json A047072.json A091867.json A120274.json
A000108.json A002420.json A033184.json A051575.json A099731.json A120303.json
A000169.json A003046.json A036355.json A059365.json A100257.json A137697.json
A000744.json A003519.json A038003.json A068231.json A104597.json A167893.json

Finally, to see raw json content of a fetched sequence

In [ ]:
!cat <SEQ_JSON_FILE> | python3.6 -m json.tool | less

pprinter

  • a proxy for searching in the OEIS
  • it shows exactly the same content you get from http://oeis.org
  • tabular representation of data sections
    • one and two dims matrix notation
  • filtering capabilities on most result's sections
  • takes advantage of caching, builts by the crawler
  • both console and notebook interfaces

help & usage

In [5]:
!python3.6 ../../src/pprinting.py --help
usage: pprinting.py [-h]
                    (--id ID | --seq SEQ | --query QUERY | --most-recents M)
                    [--force-fetch] [--cache-dir CACHE_DIR] [--tables-only]
                    [--start-index S] [--max-results R] [--data-only]
                    [--upper-limit U] [--comment-filter C]
                    [--formula-filter F] [--xrefs-filter X] [--link-filter L]
                    [--cite-filter R] [--console-width W]

OEIS Pretty Printer.

optional arguments:
  -h, --help            show this help message and exit
  --id ID               Sequence id, given in the form Axxxxxx
  --seq SEQ             Literal sequence, ordered '[...]' or presence '{...}'
  --query QUERY         Open query for plain search, in the form '...'
  --most-recents M      Print the most recent sequences ranking by M in ACCESS
                        or MODIFY, looking into --cache-dir, at most --max-
                        results (defaults to None)
  --force-fetch         Bypass cache fetching again, according to --cache-dir
                        (defaults to False)
  --cache-dir CACHE_DIR
                        Cache directory (defaults to ./fetched/)
  --tables-only         Print matrix sequences only (defaults to False)
  --start-index S       Start from result at rank position S (defaults to 0)
  --max-results R       Pretty print the former R <= 10 results (defaults to
                        10)
  --data-only           Show only data repr and preamble (defaults to False)
  --upper-limit U       Upper limit for data repr: U is a dict '{"list":i,
                        "table":(r, c)}' where i, r and c are ints (defaults
                        to i=15, r=10 and c=10), respectively)
  --comment-filter C    Apply filter C to comments, where C is Python `lambda`
                        predicate 'lambda i,c: ...' referring to i-th comment
                        c
  --formula-filter F    Apply filter F to formulae, where F is Python `lambda`
                        predicate 'lambda i,f: ...' referring to i-th formula
                        f
  --xrefs-filter X      Apply filter X to cross refs, where X is Python
                        `lambda` predicate 'lambda i,x: ...' referring to i-th
                        xref x
  --link-filter L       Apply filter L to links, where L is Python `lambda`
                        predicate 'lambda i,l: ...' referring to i-th link l
  --cite-filter R       Apply filter R to citation, where R is Python `lambda`
                        predicate 'lambda i,r: ...' referring to i-th citation
                        r
  --console-width W     Console columns (defaults to 72)

pprint using cache

the following command pretty prints Fibonacci numbers content with some filters

In [6]:
!python3.6 ../../src/pprinting.py \
    --id A000045 \
    --cache-dir=../../src/fetched/ \
    --comment-filter 'lambda i,c: "Barry" in c' \
    --formula-filter 'lambda i,f: i < 5'
 A000045 - Fibonacci numbers: F(n) = F(n-1) + F(n-2) with F(0) = 0 and
  F(1) = 1.

by _N. J. A. Sloane_, 1964

_Keywords_: `nonn,core,nice,easy,hear,changed`

_Data_:
[0  1  1  2  3  5  8  13  21  34  55  89  144  233  377]

_Comments_:
    ● Fib(n+2) = Sum_{k=0..n} binomial(floor((n+k)/2),k), row sums of
      A046854. - _Paul Barry_, Mar 11 2003

_Formulae_:
    ● G.f.: x / (1 - x - x^2).
    ● G.f.: Sum_{n>=0} x^n * Product_{k=1..n} (k + x)/(1 + k*x). - _Paul
      D. Hanna_, Oct 26 2013
    ● F(n) = ((1+sqrt(5))^n-(1-sqrt(5))^n)/(2^n*sqrt(5)).
    ● Alternatively, F(n) =
      ((1/2+sqrt(5)/2)^n-(1/2-sqrt(5)/2)^n)/sqrt(5).
    ● F(n) = F(n-1) + F(n-2) = -(-1)^n F(-n).

_Cross references_:
    ● Cf. A039834 (signed Fibonacci numbers), A001690 (complement),
      A000213, A000288, A000322, A000383, A060455, A030186, A020695,
      A020701, A071679, A099731, A100492, A094216, A094638, A000108,
      A101399, A101400, A001611, A000071, A157725, A001911, A157726,
      A006327, A157727, A157728, A157729, A167616, A059929, A144152,
      A152063, A114690, A003893, A000032, A060441, A000930, A003269,
      A000957, A057078, A007317, A091867, A104597, A249548, A262342,
      A001060, A022095.
    ● First row of arrays A103323, A234357. Second row of arrays
      A099390, A048887, and A092921 (k-generalized Fibonacci numbers).
    ● a(n) = A094718(4, n). a(n) = A101220(0, j, n).
    ● a(n) = A090888(0, n+1) = A118654(0, n+1) = A118654(1, n-1) =
      A109754(0, n) = A109754(1, n-1), for n > 0.
    ● Fibonacci-Pascal triangles: A027926, A036355, A037027, A074829,
      A105809, A109906, A111006, A114197, A162741, A228074.
    ● Boustrophedon transforms: A000738, A000744.
    ● Powers: A103323, A105317, A254719.
    ● Numbers of prime factors: A022307 and A038575.
    ● Cf. A163733.

pprint most recent

the following command pretty prints:

  • the first 3 results from fetched sequences
  • ranking them according to the most recent access time
  • reporting data only, limiting upto 10 elements for plain seqs
In [7]:
!python3.6 ../../src/pprinting.py \
    --cache-dir=../../src/fetched/ \
    --most-recent ACCESS \
    --data-only \
    --max-results 3 \
    --upper-limit '{"list":10}'
 A000007 - The characteristic function of 0: a(n) = 0^n.

by _N. J. A. Sloane_

_Keywords_: `core,nonn,mult,cons,easy`

_Data_:
[1  0  0  0  0  0  0  0  0  0]

________________________________________________________________________

 A000012 - The simplest sequence of positive numbers: the all 1's
  sequence.

by _N. J. A. Sloane_, May 16 1994

_Keywords_: `nonn,core,easy,mult,cofr,cons,tabl`

_Data_:
⎡1  0  0  0  0  0  0  0  0  0⎤
⎢                            ⎥
⎢1  1  0  0  0  0  0  0  0  0⎥
⎢                            ⎥
⎢1  1  1  0  0  0  0  0  0  0⎥
⎢                            ⎥
⎢1  1  1  1  0  0  0  0  0  0⎥
⎢                            ⎥
⎢1  1  1  1  1  0  0  0  0  0⎥
⎢                            ⎥
⎢1  1  1  1  1  1  0  0  0  0⎥
⎢                            ⎥
⎢1  1  1  1  1  1  1  0  0  0⎥
⎢                            ⎥
⎢1  1  1  1  1  1  1  1  0  0⎥
⎢                            ⎥
⎢1  1  1  1  1  1  1  1  1  0⎥
⎢                            ⎥
⎣1  1  1  1  1  1  1  1  1  1⎦

________________________________________________________________________

 A000032 - Lucas numbers (beginning at 2): L(n) = L(n-1) + L(n-2). (Cf.
  A000204.)

by _N. J. A. Sloane_, May 24 1994

_Keywords_: `nonn,nice,easy,core,changed`

_Data_:
[2  1  3  4  7  11  18  29  47  76]

tabled datas

the following command pretty prints only two dims sequences

In [8]:
!python3.6 ../../src/pprinting.py \
    --query 'pascal triangle' \
    --cache-dir=../../src/fetched/ \
    --tables-only \
    --data-only \
    --max-results 2
 A007318 - Pascal's triangle read by rows: C(n,k) = binomial(n,k) =
  n!/(k!*(n-k)!), 0 <= k <= n.

by _N. J. A. Sloane_ and _Mira Bernstein_, Apr 28 1994

_Keywords_: `nonn,tabl,nice,easy,core,look,hear,changed`

_Data_:
⎡1  0  0   0    0    0   0   0   0  0⎤
⎢                                    ⎥
⎢1  1  0   0    0    0   0   0   0  0⎥
⎢                                    ⎥
⎢1  2  1   0    0    0   0   0   0  0⎥
⎢                                    ⎥
⎢1  3  3   1    0    0   0   0   0  0⎥
⎢                                    ⎥
⎢1  4  6   4    1    0   0   0   0  0⎥
⎢                                    ⎥
⎢1  5  10  10   5    1   0   0   0  0⎥
⎢                                    ⎥
⎢1  6  15  20  15    6   1   0   0  0⎥
⎢                                    ⎥
⎢1  7  21  35  35   21   7   1   0  0⎥
⎢                                    ⎥
⎢1  8  28  56  70   56   28  8   1  0⎥
⎢                                    ⎥
⎣1  9  36  84  126  126  84  36  9  1⎦

________________________________________________________________________

 A047999 - Sierpiński's [Sierpinski's] triangle (or gasket): triangle,
  read by rows, formed by reading Pascal's triangle mod 2.

by _N. J. A. Sloane_

_Keywords_: `nonn,tabl,easy,nice`

_Data_:
⎡1  0  0  0  0  0  0  0  0  0⎤
⎢                            ⎥
⎢1  1  0  0  0  0  0  0  0  0⎥
⎢                            ⎥
⎢1  0  1  0  0  0  0  0  0  0⎥
⎢                            ⎥
⎢1  1  1  1  0  0  0  0  0  0⎥
⎢                            ⎥
⎢1  0  0  0  1  0  0  0  0  0⎥
⎢                            ⎥
⎢1  1  0  0  1  1  0  0  0  0⎥
⎢                            ⎥
⎢1  0  1  0  1  0  1  0  0  0⎥
⎢                            ⎥
⎢1  1  1  1  1  1  1  1  0  0⎥
⎢                            ⎥
⎢1  0  0  0  0  0  0  0  1  0⎥
⎢                            ⎥
⎣1  1  0  0  0  0  0  0  1  1⎦

Unix trick

running your pretty printer command using the pipe

In [ ]:
!<pprinter command> | less

allows us to use navigation and searching facilities for free ;)

Moreover, any redirection and filtering are allowed:

In [ ]:
!<pprinter command> > my-seq.txt

notebook interface

  • uncover under the hood API
  • takes advantage of inline markup language
  • since this talk is a notebook too, we show it in place
In [12]:
import pprinting

effectively request A000045 to the OEIS server

In [15]:
result = pprinting.search(id='A000045',
                          cache_info={'cache_dir':'../../src/fetched'})

from now on, it is possible to filter and pprint without any other delay

In [16]:
result(comment=lambda i,c: i < 4,
       formula=lambda i,f: i < 4)
Out[16]:

Results for query: https://oeis.org/search?fmt=json&start=0&q=id%3AA000045


A000045: Fibonacci numbers: F(n) = F(n-1) + F(n-2) with F(0) = 0 and F(1) = 1.

by N. J. A. Sloane, 1964

Keywords: nonn,core,nice,easy,hear,changed

Data:

$$ \begin{array}{c|ccccccccccccccc } n & 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 & 11 & 12 & 13 & 14 \\ \hline A000045(n) & 0 & 1 & 1 & 2 & 3 & 5 & 8 & 13 & 21 & 34 & 55 & 89 & 144 & 233 & 377 \end{array} $$

Comments:

  • Also sometimes called Lamé's sequence.
  • F(n+2) = number of binary sequences of length n that have no consecutive 0's.
  • F(n+2) = number of subsets of {1,2,...,n} that contain no consecutive integers.
  • F(n+1) = number of tilings of a 2 X n rectangle by 2 X 1 dominoes.

Formulae:

  • G.f.: x / (1 - x - x^2).
  • G.f.: Sum{n>=0} x^n * Product{k=1..n} (k + x)/(1 + k*x). - Paul D. Hanna, Oct 26 2013
  • F(n) = ((1+sqrt(5))^n-(1-sqrt(5))^n)/(2^n*sqrt(5)).
  • Alternatively, F(n) = ((1/2+sqrt(5)/2)^n-(1/2-sqrt(5)/2)^n)/sqrt(5).

Cross references:

grapher

  • let $a, b$ be OEIS seqs, so $a \sim b$ if $b\in fringe(a)$
    • $fringe(a)$ is the set of seqs in $a$'s cross ref section
  • works on your cached sequences
  • represents rel $\sim$ as both digraphs and undirected graphs
  • builds on top of networkx module
    • different layouts
    • API allows filtering of both vertices and edges
    • WIP: computing measures and spectral characteristics

help & usage

In [8]:
!python3.6 ../../src/graphing.py --help
usage: graphing.py [-h] [--directed] [--cache-dir CACHE_DIR]
                   [--graphs-dir GRAPHS_DIR] [--dpi DPI] [--layout LAYOUT]
                   F

OEIS grapher.

positional arguments:
  F                     Save image in file F.

optional arguments:
  -h, --help            show this help message and exit
  --directed            Draw directed edges
  --cache-dir CACHE_DIR
                        Cache directory (defaults to ./fetched/)
  --graphs-dir GRAPHS_DIR
                        Graphs directory (defaults to ./graphs/)
  --dpi DPI             Resolution in DPI (defaults to 600)
  --layout LAYOUT       Graph layout, choose from: {RANDOM, CIRCULAR, SHELL,
                        FRUCHTERMAN-REINGOLD, SPRING, SPECTRAL} (defaults to
                        SHELL)

repr $\sim$ from the current cache

In [14]:
!python3.6 ../../src/graphing.py \
    --cache-dir=../../src/fetched/ \
    --graphs-dir=./ \
    --layout SPRING \
    graph.png 

import the figure graph.png in place

In [15]:
from IPython.display import Image
Image("./graph.png")
Out[15]:

WIP and further work

  • better source code doc, typeset with ReadTheDocs (preview)
  • for the crawler
    • add recursion depth limit
    • better network exceptions handling
    • restarting policies if you get banned
  • for the pprinter:
    • parse and reify formulae as SymPy objs (very hard)
  • for the crawler:
    • study graph mining techniques
    • repr rel $\sim$ according to sections other than cross refs
    • add repr capabilities to spot properties (colors, labels, ...)
  • publish the package at https://pypi.python.org/pypi

고맙습니다

In [16]:
import this
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!