Tutorials

Tutorial 1: a simple project

1.1 Development environment - principles

Warning

Micc was designed for supporting HPC developers, and, consequentially, with Linux systems in mind. We provide support for Linux (Ubuntu 19.10, CentOS 7.7), and macOS. Due to lack of human resources, Windows is currently not supported.

For Python development, we highly recommend to set up your development environment as described in My Python Development Environment by Jacob Kaplan-Moss. We will assume that this is indeed the case for all tutorials here. In particular:

  • We are using pyenv to manage different Python versions on our system.
  • We use pipx to install applications like Micc and CMake system-wide together with their own virtual environment.
  • Poetry is used to set up virtual environments for the projects we are working, for managing their dependencies and for publishing them to PyPI.
  • Micc is used to set up the project structure, as the basis of everything that will be described in the tutorials below.
  • For Micc projects with binary extension the necessary compilers must be installed on the system.
  • As an IDE for Python/Fortran/C++ development we recommend:
    • Eclipse IDE for Scientific Computing with the PyDev plugin. This is an old time favorite of mine, although The learning curve is a bit steep and documentation could be better. Today, PyDev is beginning to lag behind for Python, but Eclipse is still very good for Fortran and C++.
    • PyCharm Community Edition. I only tried this one recently and was very soon convinced for python development. (Didn’t go back to Eclipse once since then). I currently have insufficient experience for Fortran and C++ for making recommendations.

1.2 Setting up your Development environment - step by step

  1. Install pyenv: See Managing Multiple Python Versions With pyenv for common install instructions on macos and Linux.

  2. Install your favourite Python versions. E.g.:

    > pyenv install 3.8.0
    
  3. Install poetry. The recommended way for this is:

    > curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python
    

    This approach will give you one single system-wide Poetry installations, which will automatically pick up the current Python version as set by pyenv. Note, that as of Poetry 1.0.0, Poetry will alse detect conda virtual environments.

    Alternatively, you can install poetry using pip:

    > pyenv local 3.8.0
    > pip install poetry
    

    This approach will not pick up the current Python version, but instead will always use the Python it was installed with, c.q. 3.8.0. This approach requires you to install Poetry in every Python version you want to use it with. When done, unset the local pyenv version:

    > pyenv local --unset
    
  4. Configure your poetry installation:

        > poetry config virtualenvs.in-project true
    
    This ensures that running ``poetry install`` in a project directory will create a
    project's virtual environment in its own root directory, rather than somewhere in
    the Poetry_ configuration directories, where it is less accessible. If you have
    several Poetry_ installations, they all use the same configuration.
    
  5. Install pipx:

    > python -m pip install --user pipx
    > python -m pipx ensurepath
    

    Note

    This will use the Python version returned by pyenv version. Micc is certainly comfortable with Python 3.7 and 3.8.

  6. Install micc with pipx:

    > pipx install et-micc
      installed package et-micc 0.10.8, Python 3.8.0
      These apps are now globally available
        - micc
    done!
    

    Note

    This will use the Python version with which pipx was installed.

  7. If you want to develop binary extensions in C++ with micc, make sure CMake and make are installed and on your system PATH. You can download CMake directly from cmake.org. Alternatively, CMake is also available as a Python Package which can be installed with pipx:

    > pipx install cmake
    installed package cmake 3.15.3, Python 3.8.0
      These apps are now globally available
        - cmake
        - cpack
        - ctest
    done!
    
  8. To upgrade to a newer version of a tool that you installed with pipx, use the upgrade command, e.g.:

    > pipx upgrade et-micc
    et-micc is already at latest version 0.10.8 (location: /Users/etijskens/.local/pipx/venvs/et-micc)
    

You should be good to go now.

??? To use set up project foo for Python 3.8.0, we would go like this:

> micc -p path/to/foo create
> cd path/to/foo
> pyenv local 3.8.0    # make python 3.8.0 the default python for this project directory
> poetry install
...                    # all dependencies are installed
> source .venv/bin/activate
(.venv) > python --version
Python 3.8.0

The last command verifies that project foo’s virtual environment is indeed based on Python 3.8.0.

If, for some reason or another, we decide later that we need 3.7.9, rather than 3.8.0, we must:

  • deactivate the virtual environment,
  • delete it,
  • delete poetry.lock,
  • repeat the above procedure, this time for python 3.7.9.

Here is how it goes:

(.venv) > dectivate
> rm -rf .venv
> rm poetry.lock
> pyenv local 3.7.9
> which python
/Users/etijskens/.pyenv/shims/python
> python --version
Python 3.7.9
> poetry install
...                    # all dependencies are installed
> source .venv/bin/activate
(.venv) > python --version
Python 3.7.9
(.venv) > which python
/path/to/foo/.venv/bin/python

1.1 Getting started with micc

The first thing we need to start a new project is a project name. Ideally, this project name is

  • descriptive
  • unique
  • short

Although one might think of even more requirements, satisfying these three is already hard enough. E.g. my_nifty_module may possibly be unique, but it is neither descriptive, neither short. On the other hand, dot_product is descriptive, reasonably descriptive, but probably not unique. Even my_dot_product is probably not unique, and, in addition, confusing to any user that might want to adopt your my_dot_product. A unique name - or at least one that has not been taken before - becomes really important when you want to publish your code for others to use it. The standard place to publish Python code is the Python Package Index where you find hundreds of thousands of projects ready to be used. Even if you have only a few colleagues that may want to use your code, you make their life easier when you publish your my_nifty_module at PyPI as they will only need to type:

> pip install my_nifty_module
(The name my_nifty_module is not used so far, but, please, choose a better name).
Micc will help you publishing your work at PyPI with as little effort as possible.

So, let us call the project ET-dot. ET denote my initials, which helps to be unique, remains descriptive, and is certainly short. First, cd into a directory that you want to use as a workspace for storing your Python projects (I am using ~/software/dev/workspace). Then ask micc to create a project, like this:

> cd ~/software/dev/workspace
> micc -p ET-dot create

The -p option (which is short for --project-path) tells micc where we want the project to be created. Here, we request a project directory ET-dot in the current working directory, here ~/software/dev/workspace. This creates a project directory with, among quite a bit of other stuff, a Python module et_dot.py

Let’s take a look at the output of the micc create command:

> micc -p ET-dot create

[INFO]           [ Creating project (ET-dot):
[INFO]               Python module (et_dot): structure = (ET-dot/et_dot.py)
[INFO]               [ Creating git repository
[WARNING]                    > git push -u origin master
[WARNING]                    (stderr)
                             remote: Repository not found.
                             fatal: repository 'https://github.com/etijskens/ET-dot/' not found
[INFO]               ] done.
[INFO]           ] done.
>

The first line:

[INFO]           [ Creating project (ET-dot):

tells us that micc indeed created a Python project in project directory ET-dot. The second line:

[INFO]               Python module (et_dot): structure = (ET-dot/et_dot.py)

explains that inside our project directory micc created a Python module et_dot.py. Note that the name of the module is perhaps not exactly what you expected: it is named et_dot.py, rather than ET-dot.py. The reason why micc decided to rename the module, is that our project name ET-dot does not comply with the PEP8 module naming rules. To make it compliant, micc replaced all capitals with lowercase, and all spaces ' ' and dashes '-' with underscores '_'. If we had choosen a PEP8 compliant name for the project directory, the project directory and the module name would be the same.

Finally, the lines

[INFO]               [ Creating git repository
[WARNING]                    > git push -u origin master
[WARNING]                    (stderr)
                             remote: Repository not found.
                             fatal: repository 'https://github.com/etijskens/ET-dot/' not found
[INFO]               ] done.

tell us that micc created a git repository. Git is a version control system that solves many practical problems related to the process of software development, independent of whether your are the only developer, or there is an entire team working on it from different places in the world. You find more information about how micc uses git in Tutorial 4.

1.1.1 Modules and packages

A Python module is the simplest Python project we can create. It is meant for rather small projects that fit in a single file. More complex projects have a package structure, that is, a directory with the same name as the module, i.e. et_dot, containing a __init__.py file. The __init__.py file marks the directory as a Python package and contains the statements that are executed when the module is imported. The module structure is the default structure. When creating a project you can opt for a package structure by appending the flag -p or --package to the micc create command:

> micc -p ET-dot create --package

[INFO]           [ Creating project (ET-dot):
[INFO]               Python package (et_dot): structure = (ET-dot/et_dot/__init__.py)
...
[INFO]           ] done.

Alternatively, you can easily convert a module structure project to a package structure project at any time:

> micc -p ET-dot convert-to-package

1.1.2 The project path in in micc

The project path (-p path) is a parameter that is accepted by all micc commands. Its default value is the current directory. So, once the project is created it is convenient to cd into it and you can leave out the -p option:

> micc -p ET-dot create
...
> micc -p ET-dot info
Project ET-dot located at /Users/etijskens/software/dev/workspace/ET-dot
  package: et_dot
  version: 0.0.0
  structure: et_dot.py (Python module)

> cd ET-dot
> micc info
Project ET-dot located at /Users/etijskens/software/dev/workspace/ET-dot
  package: et_dot
  version: 0.0.0
  structure: et_dot.py (Python module)

The micc info command shows information about a project.

This is a bit more practical as you do not have to type the -p ET-dot at every micc command. This approach works even with the micc create command. If you create an empty directory and cd into it, you can just run micc create: project like this:

> mkdir ET-dot
> cd ET-dot
> micc create
[INFO]           [ Creating project (ET-dot):
[INFO]               Python package (et_dot): structure = (ET-dot/et_dot/__init__.py)
...
[INFO]           ] done.

Warning

Micc refuses to create a new project in a non-empty directory.

Note

In the rest of the tutorial we assume that the current working directory is the project directory.

1.1.3 Managing the Python version

Your operating system typically comes with a Python version that is used OS tasks. It is, obviously good practice to isolate your system Python from your own developments: wrecking the system Python can indeed give you headaches. In addition, the system Python is often still 2.7.x, which is about to retire in 2020. Using a more recent Python version, or even several different Python versions may be very useful when you are working on many different projects. That is offered conveniently by Pyenv, (at least on macOS and Linux, but unfortunately not on Windows), see see 1.2 Setting up your Development environment - step by step for installation instructions. On my work laptop I usually keep the latest minor recent Python versions, along with the Pythonversion that came with the OS. At the time of writing that was:

> pyenv versions
  system
  3.6.9
  3.7.5
* 3.8.0 (set by /Users/etijskens/.pyenv/version)

The asterisk marks the default Python. You can set the default Python version as pyenv global <version>. It is good practice not to make the system Python default. In that way you cannot accidentally wreck your system Python.

Since Python 3.8.0 is the default Python, without any special measures, if you launch Python, it will be 3.8.0. If you want to carry out the development of the ET-dot project in another version, e.g. 3.7.5, you must set a local python version in the project directory:

> cd ET-dot
> pyenv local 3.7.5
> pyenv version
3.7.5 (set by /Users/etijskens/software/dev/ET-dot/.python-version)
> pyenv versions
  system
  3.6.9
* 3.7.5 (set by /Users/etijskens/software/dev/ET-dot/.python-version)
  3.8.0

Now, if you launch Python in the project-directory (or any of its subdirectories that does not have its own .python-version), it will be Python 3.7.5. In all othern directories where pyenv local was not run, it will still be the default Python 3.8.0.

1.1.3 Virtual environments

For a more detailed introduction to virtual environments see Python Virtual Environments: A Primer.

When you are developing or using several Python projects it can become difficult for a single Python environment to satisfy all the dependency requirements of these projects simultaneously. Dependencies conflict can easily arise. Python promotes and facilitates code reuse and as a consequence Python tools typically depend on tens to hundreds of other modules. If toolA and toolB both need moduleC, but each requires a different version of it, there is a conflict because it is impossible to install two versions of the same module in a Python environment. The solution that the Python community has come up with for this problem is the construction of virtual environments, which isolates the dependencies of a single project to a single environment.

1.1.3.1 Creating virtual environments

Since Python 3.3 Python comes with a venv module for the creation of virtual environments:

> python -m venv my_virtual_environment

This creates a directory my_virtual_environment containing a complete and isolated Python environment. This virtual environment can be activated sa:

> source my_virtual_environment/bin/activate
(my_virtual_environment) >

Activating a virtual environment modifies the command prompt to remind you constantly that you are working in a virtual environment. The virtual environment is based on the current Python - by preference set by pyenv. If you install new packages, they will be installed in the virtual environment only. The virtual environment can be deactivated by running

(my_virtual_environment) > deactivate
>
1.1.3.2 Creating virtual environments with Poetry

Poetry uses the above mechanism to manage virtual environment on a per project basis, and can install all the dependencies of that project, as specified in the pyproject.toml file, using the install command. Since our project does not have a virtual environment yet, poetry creates one, named .venv, and installs all dependencies in it. We first choose the Python version to use for the project:

> pyenv local 3.7.5
> poetry install
Creating virtualenv et-dot in /Users/etijskens/software/dev/ET-dot/.venv
Updating dependencies
Resolving dependencies... (0.8s)

Writing lock file


Package operations: 10 installs, 0 updates, 0 removals

  - Installing pyparsing (2.4.5)
  - Installing six (1.13.0)
  - Installing atomicwrites (1.3.0)
  - Installing attrs (19.3.0)
  - Installing more-itertools (7.2.0)
  - Installing packaging (19.2)
  - Installing pluggy (0.13.1)
  - Installing py (1.8.0)
  - Installing wcwidth (0.1.7)
  - Installing pytest (4.6.6)
  - Installing ET-dot (0.0.0)

The installed packages are all dependencies of pytest which we require for testing our code. The last package is ET-dot itself, which is installed in so-called development mode. This means that any changes in the source code are immediately visible in the virtual environment. Adding/removing dependencies is easily achieved by running poetry add some_module and poetry remove some_other_module. Consult the poetry documentation for details

If the virtual environment already exists, or if some virtual environment is activated (not necessarily that of the project itself - be warned), that virtual environment is reused and all installations pertain to that virtual environment.

To use the just created virtual environment of our project, we must activate it:

> source .venv/bin/activate
(.venv) > which python
/Users/etijskens/software/dev/ET-dot/.venv/bin/python
(.venv)> python --version
> python --version
Python 3.7.5

The location of the virtual environment’s Python and its version are as expected.

Note

Whenever you see a command prompt like (.venv) > the local virtual environment of the project has been activated. If you want to try yourself, you must activate it too.

To deactivate a script just run deactivate:

(.venv) > deactivate
> which python
/Users/etijskens/.pyenv/shims/python

The (.venv) notice disappears, and the active python is no longer that in the virtual environment.

If something is wrong with a virtual environment, you can simply delete it:

> rm -rf .venv

and create a new one. Sometimes it is necessary to delete the poetry.lock as well:

> rm poetry.lock

1.1.4 Modules and scripts

Note that micc always creates fully functional examples, complete with test code and documentation generation, so that you can inspect the files and see as much as possible how things are supposed to work. E.g. here is the :file`ET-dot/et_dot.py` module:

# -*- coding: utf-8 -*-
"""
Package et_dot
==============

A 'hello world' example.
"""
__version__ = "0.0.0"


def hello(who='world'):
    """'Hello world' method."""
    result = "Hello " + who
    return result

The module can be used right away. Open an interactive Python session and enter the following commands:

> cd path/to/ET-dot
> source .venv/bin/activate
(.venv) > python
Python 3.8.0 (default, Nov 25 2019, 20:09:24)
[Clang 11.0.0 (clang-1100.0.33.12)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import et_dot
>>> et_dot.hello()
'Hello world'
>>> et_dot.hello("student")
'Hello student'
>>>

Productivity tip

Using an interactive python session to verify that a module does indeed what you expect is a bit cumbersome. A quicker way is to modify the module so that it can also behave as a script. Add the following lines to ET-dot/et_dot.py at the end of the file:

if __name__=="__main__":
   print(hello())
   print(hello("student"))

and execute it on the command line:

(.venv) > python et_dot.py
Hello world
Hello student

The body of the if statement is only executed if the file is executed as a script. When the file is imported, it is ignored.

While working on a single-file project it is sometimes handy to put your tests the body of if __name__=="__main__":, as below:

if __name__=="__main__":
   assert hello() == "Hello world"
   assert hello("student") == "Hello student"
   print("-*# success #*-")

The last line makes sure that you get a message that all tests went well if they did, otherwise an AssertionError will be raised. When you now execute the script, you should see:

(.venv) > python et_dot.py
-*# success #*-

When you develop your code in an IDE like eclipse+pydev or PyCharm, you can even execute the file without having to leave your editor and switch to a terminal. You can quickly code, test and debug in a single window.

While this is a very productive way of developing, it is a bit on the quick and dirty side. If the module code and the tests become more involved, however,the file will soon become cluttered with test code and a more scalable way to organise your tests is needed. Micc has already taken care of this.

1.1.5 Testing your code

When micc creates a new project, or when you add components to an existing project, it immediately adds a test script for each component in the tests directory. The test script for the et_dot module is in file ET-dot/tests/test_et_dot.py. Let’s take a look at the relevant section:

# -*- coding: utf-8 -*-
"""Tests for et_dot package."""

import et_dot

def test_hello_noargs():
    """Test for foo.hello()."""
    s = foo.hello()
    assert s=="Hello world"

def test_hello_me():
    """Test for foo.hello('me')."""
    s = foo.hello('me')
    assert s=="Hello me"

Tests like this are very useful to ensure that during development the changes to your code do not break things. There are many Python tools for unit testing and test driven development. Here, we use Pytest:

> pytest
=============================== test session starts ===============================
platform darwin -- Python 3.7.4, pytest-4.6.5, py-1.8.0, pluggy-0.13.0
rootdir: /Users/etijskens/software/dev/workspace/foo
collected 2 items

tests/test_foo.py ..                                                        [100%]

============================ 2 passed in 0.05 seconds =============================

The output shows some info about the environment in which we are running the tests, the current working directory (c.q. the project directory, and the number of tests it collected (2). Pytest looks for test methods in all test_*.py or *_test.py files in the current directory and accepts test prefixed methods outside classes and test prefixed methods inside Test prefixed classes as test methods to be executed.

If a test would fail you get a detailed report to help you find the cause of the error and fix it.

1.1.6 Debugging test code

When the report provided by pytest does not yield a clue on the cause of the failing test, you must use debugging and execute the failing test step by step to find out what is going wrong where. From the viewpoint of pytest, the files in the tests directory are modules. Pytest imports them and collects the test methods, and executes them. Micc makes every test module executable using the technique described in 1.1.4 Modules and scripts. At the end of every test file you will find some extra code

if __name__ == "__main__":
    the_test_you_want_to_debug = test_hello_noargs

    print("__main__ running", the_test_you_want_to_debug)
    the_test_you_want_to_debug()
    print('-*# finished #*-')

On the first line of the if __name__ == "__main__": body, the variable the_test_you_want_to_debug is set to the name of some test method in our test file test_et_dot.py, here test_hello_noargs. The variable the_test_you_want_to_debug is now just another variable pointing to the very same function object as test_hello_noargs and behaves exactly the same (see Functions are first class objects). The next statement prints a start message that tells you that __main__ is running that test method, after which the test method is called through the the_test_you_want_to_debug variable, and finally another message is printed to let you know that the script finished. Here is the output you get when running this test file as a script:

(.venv) > python tests/test_et_dot.py
__main__ running <function test_hello_noargs at 0x1037337a0>
-*# finished #*-

The execution of the test does not produce any output. Now you can use your favourite Python debugger to execute this script and step into the test_hello_noargs test method and from there into foo.hello to examine if everything goes as expected. Thus, to debug a failing test, you assign its name to the the_test_you_want_to_debug variable and debug the script.

Note

As test code is also code, it can contain bugs. More often than not, it happens that the code tested is correct, but the test is flawed.

1.1.7 Generating documentation

Documentation is extracted from the source code using Sphinx. It is almost completely generated automatically from the doc-strings in your code. Doc-strings are the text between triple double quote pairs in the examples above, e.g. """This is a doc-string.""". Important doc-strings are:

  • module doc-strings: at the beginning of the module. Provides an overview of what the module is for.
  • class doc-strings: right after the class statement: explains what the class is for. (Usually, the doc-string of the __init__ method is put here as well, as dunder methods (starting and ending with a double underscore) are not automatically considered by sphinx.
  • method doc-strings: right after a def statement.

According to pep-0287 the recommended format for Python doc-strings is restructuredText. E.g. a typical method doc-string looks like this:

def hello_world(who='world'):
    """Short (one line) description of the hello_world method.

    A detailed and longer description of the hello_world method.
    blablabla...

    :param str who: an explanation of the who parameter. You should
        mention its default value.
    :returns: a description of what hello_world returns (if relevant).
    :raises: which exceptions are raised under what conditions.
    """

Here, you can find some more examples.

Thus, if you take good care writing doc-strings, helpfule documentation follows automatically.

Micc sets up al the necessary components for documentation generation in sub-directory et-dot/docs/. There, you find a Makefile that provides a simple interface to Sphinx. Here is the workflow that is necessary to build the documentation:

> cd path/to/et-dot
> source .venv/bin/activate
(.venv) > cd docs
(.venv) > make <documentation_format>

Let’s explain the steps

  1. cd into the project directory:

    > cd path/to/et-dot
    >
    
  2. Activate the project’s virtual environment:

    > source .venv/bin/activate
    (.venv) >
    
  3. cd into the docs subdirectory:

    (.venv) > cd docs
    (.venv) >
    

    Here, you will find the Makefile that does the work:

    (.venv) > ls -l
    total 80
    -rw-r--r--  1 etijskens  staff  1871 Dec 10 11:24 Makefile
    ...
    

To see a list of possible documentation formats, just run make without arguments:

(.venv) > make
Sphinx v2.2.2
Please use `make target' where target is one of
  html        to make standalone HTML files
  dirhtml     to make HTML files named index.html in directories
  singlehtml  to make a single large HTML file
  pickle      to make pickle files
  json        to make JSON files
  htmlhelp    to make HTML files and an HTML help project
  qthelp      to make HTML files and a qthelp project
  devhelp     to make HTML files and a Devhelp project
  epub        to make an epub
  latex       to make LaTeX files, you can set PAPER=a4 or PAPER=letter
  latexpdf    to make LaTeX and PDF files (default pdflatex)
  latexpdfja  to make LaTeX files and run them through platex/dvipdfmx
  text        to make text files
  man         to make manual pages
  texinfo     to make Texinfo files
  info        to make Texinfo files and run them through makeinfo
  gettext     to make PO message catalogs
  changes     to make an overview of all changed/added/deprecated items
  xml         to make Docutils-native XML files
  pseudoxml   to make pseudoxml-XML files for display purposes
  linkcheck   to check all external links for integrity
  doctest     to run all doctests embedded in the documentation (if enabled)
  coverage    to run coverage check of the documentation (if enabled)
(.venv) >
  1. To build documentation in html format, enter:

    (.venv) > make html
    ...
    (.venv) >
    

    This will generation documentation in et-dot/docs/_build/html. Note that it is essential that this command executes in the project’s virtual environment. You can view the documentation in your favorit browser:

    (.venv) > open _build/html/index.html

    Here is a screenshot:

    _images/im1.png

    If your expand the API tab on the left, you get to see the et_dot module documentation, as it generated from the doc-strings:

    _images/im2.png
  2. To build documentation in .pdf format, enter:

    (.venv) > make latexpdf
    

    This will generation documentation in :file:et-dot/docs/_build/latex/et-dot.pdf`. Note that it is essential that this command executes in the project’s virtual environment. You can view it in your favorite pdf viewer:

    (.venv) > open _build/latex/et-dot.pdf
    (.venv) >
    

Note

When building documentation by running the docs/Makefile, it is verified that the correct virtual environment is activated, and that the needed Python modules are installed in that environment. If not, they are first installed using pip install. These components are not becoming dependencies of the project. If needed you can add dependencies using the poetry add command.

The boilerplate code for documentation generation is in the docs directory, just as if it were generated by hand using sphinx-quickstart. (In fact, it was generated using sphinx-quickstart, but then turned into a Cookiecutter template.) those files is not recommended, and only rarely needed. Then there are a number of .rst files with capitalized names in the project directory:

  • README.rst is assumed to contain an overview of the project,
  • API.rst describes the classes and methods of the project in detail,
  • APPS.rst describes command line interfaces or apps added to your project.
  • AUTHORS.rst list the contributors to the project
  • HISTORY.rst which should describe the changes that were made to the code.

The .rst extenstion stands for reStructuredText. It iss a simple and concise approach to text formatting.

If you add components to your project through micc, care is taken that the .rst files in the project directory and the docs directory are modified as necessary, so that sphinx is able find the doc-strings. Even for command line interfaces (CLI, or console scripts) based on click the documentation is generated neatly from the help strings of options and the doc-strings of the commands.

1.1.8 The license file

The project directory contains a LICENCE file, a text file describing the licence applicable to your project. You can choose between

  • MIT license (default),
  • BSD license,
  • ISC license,
  • Apache Software License 2.0,
  • GNU General Public License v3 and
  • Not open source.

MIT license is a very liberal license and the default option. If you’re unsure which license to choose, you can use resources such as GitHub’s Choose a License

You can select the license file when you create the project:

> cd some_empty_dir
> micc create --license BSD

Of course, the project depends in no way on the license file, so it can be replaced manually at any time by the license you desire.

1.1.9 The Pyproject.toml file

The file pyproject.toml (located in the project directory) is the modern way to describe the build system requirements of the project: PEP 518. Although most of this file’s content is generated automatically by micc and poetry some understanding of it is useful, consult https://poetry.eustace.io/docs/pyproject/.

The pyproject.toml file is rather human-readable:

> cat pyproject.toml
[tool.poetry]
name = "ET-dot"
version = "1.0.0"
description = "<Enter a one-sentence description of this project here.>"
authors = ["Engelbert Tijskens <engelbert.tijskens@uantwerpen.be>"]
license = "MIT"

readme = 'README.rst'

repository = "https://github.com/etijskens/ET-dot"
homepage = "https://github.com/etijskens/ET-dot"

keywords = ['packaging', 'poetry']

[tool.poetry.dependencies]
python = "^3.7"
et-micc-build = "^0.10.10"

[tool.poetry.dev-dependencies]
pytest = "^4.4.2"

[tool.poetry.scripts]

[build-system]
requires = ["poetry>=0.12"]
build-backend = "poetry.masonry.api"

1.1.10 The log file Micc.log

The project directory also contains a log file micc.log. All micc commands that modify the state of the project leave a trace in this file, So you can look up what happened when to your project. Should you think that the log file has become too big, or just useless, you can delete it manually, or add the --clear-log flag before any micc subcommand, to remove it. If the subcommand alters the state of the project, the log file will only contain the log messages from the last subcommand.

> ll micc.log
-rw-r--r--  1 etijskens  staff  34 Oct 10 20:37 micc.log

> micc --clear-log info
Project bar located at /Users/etijskens/software/dev/workspace/bar
  package: bar
  version: 0.0.0
  structure: bar.py (Python module)

> ll micc.log
ls: micc.log: No such file or directory

1.1.11 Adjusting micc to your needs

Micc is based on a series of additive Cookiecutter templates which generate the boilerplate code. If you like, you can tweak these templates in the site-packages/et_micc/templates directory of your micc installation. When you pipx installed micc, that is typically something like:

~/.local/pipx/venvs/et-micc/lib/pythonX.Y/site-packages/et_micc,

where :file`pythonX.Y` is the python version you installed micc with.

1.2 Your first project

Let’s start with a simple problem: a Python module that computes the dot product of two arrays. Admittedly, this not a very rewarding goal, as there are already many Python packages, e.g. Numpy, that solve this problem in an elegant and efficient way. However, because the dot product is such a simple concept in linear algebra, it allows us to illustrate the usefulness of Python as a language for High Performance Computing, as well as the capabilities of Micc.

If you haven’t carried out the steps in 1.1 Getting started with micc, set up a new project (you are of course encouraged to change the project name as to make it unique) :

> micc -p ET-dot create --package
[INFO]           [ Creating project (ET-dot):
[INFO]               Python package (et_dot): structure = (ET-dot/et_dot/__init__.py)
[INFO]               [ Creating git repository
[WARNING]                    > git push -u origin master
[WARNING]                    (stderr)
                             remote: Repository not found.
                             fatal: repository 'https://github.com/etijskens/ET-dot/' not found
[INFO]               ] done.
[WARNING]            Run 'poetry install' in the project directory to create a virtual environment and install its dependencies.
[INFO]           ] done.
> cd ET-dot

Next, we create a virtual environment for the project and activate it:

> poetry install
Creating virtualenv et-dot in /Users/etijskens/software/dev/workspace/tmp/ET-dot/.venv
Updating dependencies
Resolving dependencies... (0.8s)

Writing lock file


Package operations: 10 installs, 0 updates, 0 removals

  - Installing pyparsing (2.4.5)
  - Installing six (1.13.0)
  - Installing atomicwrites (1.3.0)
  - Installing attrs (19.3.0)
  - Installing more-itertools (8.0.2)
  - Installing packaging (19.2)
  - Installing pluggy (0.13.1)
  - Installing py (1.8.0)
  - Installing wcwidth (0.1.7)
  - Installing pytest (4.6.7)
  - Installing ET-dot (0.0.0)
> source .venv/bin/activate
(.venv) >

Open module file et_dot.py in your favourite editor and change it as follows:

# -*- coding: utf-8 -*-
"""
Package et_dot
==============
Python module for computing the dot product of two arrays.
"""
__version__ = "0.0.0"

def dot(a,b):
    """Compute the dot product of *a* and *b*.

    :param a: a 1D array.
    :param b: a 1D array of the same length as *a*.
    :returns: the dot product of *a* and *b*.
    :raises: ArithmeticError if ``len(a)!=len(b)``.
    """
    n = len(a)
    if len(b)!=n:
        raise ArithmeticError("dot(a,b) requires len(a)==len(b).")
    d = 0
    for i in range(n):
        d += a[i]*b[i]
    return d

We defined a dot() method with an informative doc-string that describes the parameters, the return value and the kind of exceptions it may raise.

We could use the dot method in a script as follows:

from et_dot import dot

a = [1,2,3]
b = [4.1,4.2,4.3]
a_dot_b = dot(a,b)

Note

This dot product implementation is naive for many reasons:

  • Python is very slow at executing loops, as compared to Fortran or C++.
  • The objects we are passing in are plain Python list`s. A :py:obj:`list is a very powerfull data structure, with array-like properties, but it is not exactly an array. A list is in fact an array of pointers to Python objects, and therefor list elements can reference anything, not just a numeric value as we would expect from an array. With elements being pointers, looping over the array elements implies non-contiguous memory access, another source of inefficiency.
  • The dot product is a subject of Linear Algebra. Many excellent libraries have been designed for this purpose. Numpy should be your starting point because it is well integrated with many other Python packages. There is also Eigen a C++ library for linear algebra that is neatly exposed to Python by pybind11.

In order to verify that our implementation of the dot product is correct, we write a test. For this we open the file tests/test_et_dot.py. Remove the original tests, and add a new one:

import et_dot

def test_dot_aa():
    a = [1,2,3]
    expected = 14
    result = et_dot.dot(a,a)
    assert result==expected

Save the file, and run the test. Pytest will show a line for every test source file. On each such line a . will appear for every successfull test, and a F for a failing test.

(.venv) > pytest
=============================== test session starts ===============================
platform darwin -- Python 3.7.4, pytest-4.6.5, py-1.8.0, pluggy-0.13.0
rootdir: /Users/etijskens/software/dev/workspace/ET-dot
collected 1 item

tests/test_et_dot.py .                                                      [100%]

============================ 1 passed in 0.08 seconds =============================
(.venv) >

Note

If the project’s virtual environment is not activated, the command pytest will generally not be found.

Great! our test succeeded. Let’s increment the project’s version (-p is short for --patch, and requests incrementing the patch component of the version string):

(.venv) > micc version -p
[INFO]           (ET-dot)> micc version (0.0.0) -> (0.0.1)

Obviously, our test tests only one particular case. A clever way of testing is to focus on properties. From mathematics we now that the dot product is commutative. Let’s add a test for that.

import random

def test_dot_commutative():
    # create two arrays of length 10 with random float numbers:
    a = []
    b = []
    for _ in range(10):
        a.append(random.random())
        b.append(random.random())
    # do the test
    ab = et_dot.dot(a,b)
    ba = et_dot.dot(b,a)
    assert ab==ba

You can easily verify that this test works too. We increment the version string again:

(.venv) > micc version -p
[INFO]           (ET-dot)> micc version (0.0.1) -> (0.0.2)

There is however a risk in using arrays of random numbers. Maybe we were just lucky and got random numbers that satisfy the test by accident. Also the test is not reproducible anymore. The next time we run pytest we will get other random numbers, and may be the test will fail. That would represent a serious problem: since we cannot reproduce the failing test, we have no way finding out what went wrong. For random numbers we can fix the seed at the beginning of the test. Random number generators are deterministic, so fixing the seed makes the code reproducible. To increase coverage we put a loop around the test.

def test_dot_commutative_2():
    # Fix the seed for the random number generator of module random.
    random.seed(0)
    # choose array size
    n = 10
    # create two arrays of length n with with zeros:
    a = n * [0]
    b = n * [0]
    # repetion loop:
    for r in range(1000):
        # fill a and b with random float numbers:
        for i in range(n):
            a[i] = random.random()
            b[i] = random.random()
        # do the test
        ab = et_dot.dot(a,b)
        ba = et_dot.dot(b,a)
        assert ab==ba

Again the test works. Another property of the dot product is that the dot product with a zero vector is zero.

def test_dot_zero():
    # Fix the seed for the random number generator of module random.
    random.seed(0)
    # choose array size
    n = 10
    # create two arrays of length n with with zeros:
    a = n * [0]
    zero = n * [0]
    # repetion loop (the underscore is a placeholder for a variable dat we do not use):
    for _ in range(1000):
        # fill a with random float numbers:
        for i in range(n):
            a[i] = random.random()
        # do the test
        azero = et_dot.dot(a,zero)
        assert azero==0

This test works too. Furthermore, the dot product with a vector of ones is the sum of the elements of the other vector:

def test_dot_one():
    # Fix the seed for the random number generator of module random.
    random.seed(0)
    # choose array size
    n = 10
    # create two arrays of length n with with zeros:
    a = n * [0]
    one = n * [1.0]
    # repetion loop (the underscore is a placeholder for a variable dat we do not use):
    for _ in range(1000):
        # fill a with random float numbers:
        for i in range(n):
            a[i] = random.random()
        # do the test
        aone = et_dot.dot(a,one)
        expected = sum(a)
        assert aone==expected

Success again. We are getting quite confident in the correctness of our implementation. Here is another test:

def test_dot_one_2():
    a1 = 1.0e16
    a   = [a1 ,1.0,-a1]
    one = [1.0,1.0,1.0]
    expected = 1.0
    result = et_dot.dot(a,one)
    assert result==expected

Clearly, it is a special case of the test above the expected result is the sum of the elements in a, that is 1.0. Yet it - unexpectedly - fails. Fortunately pytest produces a readable report about the failure:

> pytest
================================= test session starts ==================================
platform darwin -- Python 3.7.4, pytest-4.6.5, py-1.8.0, pluggy-0.13.0
rootdir: /Users/etijskens/software/dev/workspace/ET-dot
collected 6 items

tests/test_et_dot.py .....F                                                      [100%]

======================================= FAILURES =======================================
____________________________________ test_dot_one_2 ____________________________________

    def test_dot_one_2():
        a1 = 1.0e16
        a   = [a1 , 1.0, -a1]
        one = [1.0, 1.0, 1.0]
        expected = 1.0
        result = et_dot.dot(a,one)
>       assert result==expected
E       assert 0.0 == 1.0

tests/test_et_dot.py:91: AssertionError
========================== 1 failed, 5 passed in 0.17 seconds ==========================
>

Mathematically, our expectations about the outcome of the test are certainly correct. Yet, pytest tells us it found that the result is 0.0 rather than 1.0. What could possibly be wrong? Well our mathematical expectations are based on our - false - assumption that the elements of a are real numbers, most of which in decimal representation are characterised by an infinite number of digits. Computer memory being finite, however, Python (and for that matter all other programming languages) uses a finite number of bits to approximate real numbers. These numbers are called floating point numbers and their arithmetic is called floating point arithmetic. Floating point arithmetic has quite different properties than real number arithmetic. A floating point number in Python uses 64 bits which yields approximately 15 representable digits. Observe the consequences of this in the Python statements below:

>>> 1.0 + 1e16
1e+16
>>> 1e16 + 1.0 == 1e16
True
>>> 1.0 + 1e16 == 1e16
True
>>> 1e16 + 1.0 - 1e16
0.0

There are several lessons to be learned from this:

  • The test does not fail because our code is wrong, but because our mind is used to reasoning about real number arithmetic, rather than floating point arithmetic rules. As the latter is subject to round-off errors, tests sometimes fail unexpectedly. Note that for comparing floating point numbers the the standard library provides a math.isclose() method.
  • Another silent assumption by which we can be mislead is in the random numbers. In fact, random.random() generates pseudo-random numbers in the interval ``[0,1[``, which is quite a bit smaller than ]-inf,+inf[. No matter how often we run the test the special case above that fails will never be encountered, which may lead to unwarranted confidence in the code.

So, how do we cope with the failing test? Here is a way using math.isclose():

import math

def test_dot_one_2():
    a1 = 1.0e16
    a   = [a1 , 1.0, -a1]
    one = [1.0, 1.0, 1.0]
    expected = 1.0
    result = et_dot.dot(a,one)
    # assert result==expected
    assert math.isclose(result, expected, abs_tol=10.0)

This is a reasonable solution if we accept that when dealing with numbers as big as 1e19, an absolute difference of 10 is negligible.

Another aspect that should be tested is the behavior of the code in exceptional circumstances. Does it indeed raise ArithmeticError if the arguments are not of the same length? Here is a test:

import pytest

def test_dot_unequal_length():
    a = [1,2]
    b = [1,2,3]
    with pytest.raises(ArithmeticError):
        et_dot.dot(a,b)

Here, pytest.raises() is a context manager that will verify that ArithmeticError is raise when its body is executed.

Note

A detailed explanation about context managers see https://jeffknupp.com/blog/2016/03/07/python-with-context-managers//

Note that you can easily make et_dot.dot() raise other exceptions, e.g. TypeError by passing in arrays of non-numeric types:

>>> et_dot.dot([1,2],[1,'two'])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/etijskens/software/dev/workspace/ET-dot/et_dot.py", line 23, in dot
    d += a[i]*b[i]
TypeError: unsupported operand type(s) for +=: 'int' and 'str'
>>>

Note that it is not the product a[i]*b[i] for i=1 that is wreaking havoc, but the addition of its result to d.

At this point you might notice that even for a very simple and well defined function as the dot product the amount of test code easily exceeds the amount of tested code by a factor of 5 or more. This is not at all uncommon. As the tested code here is an isolated piece of code, you will probably leave it alone as soon as it passes the tests and you are confident in the solution. If at some point, the dot() would fail you should write a test that reproduces the error and improve the solution so that it passes the test.

When constructing software for more complex problems, there will very soon be many interacting components and running the tests after modifying one of the components will help you assure that all components still play well together, and spot problems as soon as possible.

At this point we want to produce a git tag of the project:

(.venv) > micc tag
[INFO] Creating git tag v0.0.7 for project ET-dot
[INFO] Done.

The tag is a label for the current code base of our project.

1.3 Improving efficiency

There are times when a correct solution - i.e. a code that solves the problem correctly - is sufficient. Most of the time, however, the solution also needs use resources efficiently, runtime, memory, … Especially in High Performance Computing, where compute tasks may run for several days and use hundreds of compute nodes, and resources are to be sharede wiht may researchers, using the resources efficiently is of utmost importance.

However important efficiency may be, it is nevertheless a good strategy for developing a new piece of code, to start out with a simple, even naive implementation in Python, neglecting all efficiency considerations, but focussing on correctness. Python has a reputation of being an extremely productive programming language. Once you have proven the correctness of this first version it can serve as a reference solution to verify the correctness of later efficiency improvements. In addition, the analysis of this version can highlight the sources of inefficiency and help you focus your attention to the parts that really need it.

1.3.1 Timing your code

The simplest way to probe the efficiency of your code is to time it: write a simple script and record how long it takes to execute. Let us first look at the structure of a Python script.

Here’s a script (using the above structure) that computes the dot product of two long arrays of random numbers.

"""file ET_dot/prof/run1.py"""
import random
from et_dot import dot

def random_array(n=1000):
    """Initialize an array with n random numbers in [0,1[."""
    # Below we use a list comprehension (a Python idiom for creating a list from an iterable object).
    a = [random.random() for i in range(n)]
    return a

if __name__=='__main__':
    a = random_array()
    b = random_array()
    print(dot(a,b))
    print('-*# done #*-')

We store this file, which we rather simply called run1.py, in a directory prof in the project directory where we intend to keep all our profiling work. You can execute the script from the command line (with the project directory as the current working directory:

(.venv) > python ./prof/run1.py
251.08238559724717
-*# done #*-

Note

As our script does not fix the random number seed, every run has a different outcome.

We are now ready to time our script. Micc provides a practical context manager class et_micc.Stopwatch to time pieces of code.

"""file ET_dot/prof/run1.py"""
from et_micc.stopwatch import Stopwatch

...

if __name__=='__main__':
    with Stopwatch() as timer:
        a = random_array()
        b = random_array()
        print("init:",timer.timelapse(),'s')
        dot(a,b)
        print("dot :",timer.timelapse(),'s')
    print('-*# done #*-')

When the script is exectuted the two print statements will print the duration of the initalisation of a and b and of the computation of the dot product of a and b. Finally, upon exit the Stopwatch will print the total time.

(.venv) > python ./prof/run1.py
init: 0.000281 s
dot : 0.000174 s
0.000465 s
-*# done #*-
>

Note that the initialization phase took longer than the computation. Random number generation is rather expensive. The last number is the total time spent inside the stopwatch body, and is printed automatically. If you like you can customise this message by setting the message parameter in the constructor of the stopwatch:

with Stopwatch(message="total") as timer:
   ...

which would have output:

(.venv) > python ./prof/run1.py
init: 0.000281 s
dot : 0.000174 s
total 0.000465 s
-*# done #*-
>

1.3.2 Comparing to Numpy

As said earlier, our implementation of the dot product is rather naive. If you want to become a good programmer, you should understand that you are probably not the first researcher in need of a dot product implementation. For most linear algebra problems, Numpy provides very efficient implementations. Below the run1.py script adds timing results for the Numpy equivalent of our code.

"""file ET_dot/prof/run1.py"""
import numpy as np

...

if __name__=='__main__':
    with Stopwatch() as timer:
        a = random_array()
        b = random_array()
        print("et init:",timer.timelapse(),'s')
        dot(a,b)
        print("et dot :",timer.timelapse(),'s')

    with Stopwatch() as timer:
        a = np.random.rand(1000)
        b = np.random.rand(1000)
        print("np init:",timer.timelapse(),'s')
        np.dot(a,b)
        print("np dot :",timer.timelapse(),'s')

    print('-*# done #*-')

When you run this code, you will get a ModuleNotFoundError for Numpy, as it it not yet a dependency of our ET-dot project and Numpy is not yet installed in our virtual environment. If you do not want Numpy to become a dependency of ET-dot, just install it in the virtual environment

.. code-block:: bash

(.venv) > pip install numpy Collecting numpy

Installing collected packages: numpy Successfully installed numpy-1.17.4 Here are the results. Note that the Numpy version is significantly faster, both for initialization (x3.2) and for the dot product (x6.8). (.venv) >

If, on the other hand, you want Numpy to become a dependency of ET-dot, and have it always automatically installed together with ET-dot, you must run:”

(.venv) > poetry add numpy
Using version ^1.17.4 for numpy

Updating dependencies
Resolving dependencies... (0.2s)

Writing lock file


Package operations: 1 install, 0 updates, 0 removals

  - Installing numpy (1.17.4)

(.venv) >

Here are the results of the modified script:

(.venv) > python ./prof/run1.py
et init: 0.000252 s
et dot : 0.000219 s
0.000489 s
np init: 7.8e-05 s
np dot : 3.2e-05 s
0.00012 s
-*# done #*-
>

Obviously, Numpy does significantly better than our naive dot product implementation. The reasons for this improvement are:

  • Numpy arrays are contiguous data structures of floating point numbers, unlike Python’s list. Contiguous memory access is far more efficient.
  • The loop over Numpy arrays is implemented in a low-level programming languange. This allows to make full use of the processors hardware features, such as vectorization and fused multiply-add (FMA).

Tutorial 2: Binary extensions

Binary extensions are

Suppose for a moment that Numpy did not have a dot product implementation and that the implementation provided in Tutorial-1 is way too slow to be practical for your research project. Consequently, you are forced to accelarate your dot product code in some way or another. There are several approaches for this. Here are a number of interesting links covering them:

Most of these approaches do not require special support from Micc to get you going, and we encourage you to go try out the High Performance Python series 1-3 for the ET-dot project. Two of the approacheq discussed involve rewriting your code in Modern Fortran or C++ and generate a shared library that can be imported in Python just as any Python module. Such shared libraries are called binary extension modules. This approach is by far the most scalable and flexible of all current acceleration strategies, as these languages are designed to squeeze the maximum of performance out of a CPU. However, figuring out how to make this work is a bit of a challenge, especially in the case of C++.

Micc automates the task of generating the binary extensions from source code in Fortran and C++. It is as simple as this:

Add a som binary extension module: to your project:

> micc add foo --f2py   # add a binary extension written in Fortran
> micc add bar --cpp    # add a binary extension written in C++

You put your own code in the source code files and execute :

(.venv) > micc-build

Mind that the virtual environment must be activated to execute the micc-build (see 1.1.3 Virtual environments). Now you can import modules foo and bar in your project and use their subroutines and functions.

2.0 Binary extensions in Micc projects

Micc provides boilerplate code for binary extensions as well as some practical wrappers around top-notch tools for building binary extensions from Fortran and C++. Fortran code is compiled into a Python module using f2py (which comes with Numpy). For C++ we use Pybind11 and CMake.

2.0.1 Choosing between Fortran and C++ for binary extension modules

Here are a number of arguments that you may wish to take into account for choosing the programming language for your binary extension modules:

  • Fortran is a simpler languages than C++
  • It is easier to write efficient code in Fortran than C++
  • C++ is a much more expressive language
  • C++ comes with a huge standard library, providing lots of data structures and algorithms that are hard to match in Fortran. If the standard library is not enough, there is also the highly recommended Boost libraries and many other domain specific libraries. There are also domain specific libraries in Fortran, but the amount differs by an order of magnitude at least.
  • With Pybind11 you can almost expose anything from the C++ side to Python, not just functions.
  • Modern Fortran is (imho) not as good documented as C++. Useful place to look for language features and idioms are:

In short, C++ provides much more possibilities, but it is not for the novice.

2.0.2 Converting a module structure to a package structure

Module structure projects are meant for small projects consisting of a single module file, here et_dot.py in the project directory. For more involved projects a package structure is more appropriate. Package structure projects can contain additional python modules, binary extension modules written in Fortran or C++, as well as command line interfaces (CLIs). In a package structure, the project directory has a subdirectory with the package name, c.q. et_dot, that contains an __init__.py file, which has the same content as the et_dot.py file in the module structure.

Since we started out with a module project ET-dot, its module structure (ET-dot/et_dot.py) must be converted to a package structure (ET-dot/et_dot/__init__.py) before we can add a f2py (Fortran) binary extension module to it.

> micc convert-to-package
Converting simple Python project ET-dot to general Python project.
[WARNING]        Pre-existing files in /Users/etijskens/software/dev/workspace that would be overwritten:
[WARNING]          /Users/etijskens/software/dev/workspace/ET-dot/docs/index.rst
   Aborting because 'overwrite==False'.
     Rerun the command with the '--backup' flag to first backup these files (*.bak).
     Rerun the command with the '--overwrite' flag to overwrite these files without backup.
   Aborting.
[CRITICAL]       Exiting (-3) ...
[WARNING]        It is normally ok to overwrite 'index.rst' as you are not supposed
                 to edit the '.rst' files in '/Users/etijskens/software/dev/workspace/ET-dot/docs.'
                 If in doubt: rerun the command with the '--backup' flag,
                   otherwise: rerun the command with the '--overwrite' flag,

Without extra options the command fails because it wants to replace the file ET-dot/docs/index.rst, which we do not allow, because the user may have modified that file (although the files ET-dot/docs directory are in fact not meant for being edited by the user). If he has not edited ET-dot/docs/index.rst the user can safely rerun the command with the --overwrite flag. Otherwise he must use the --backup flag to keep a backup of the original ET-dot/docs/index.rst. That way he can inspect the original file and transfer his changes to the new file.

> micc convert-to-package --overwrite
Converting simple Python project ET-dot to general Python project.
[WARNING]        '--overwrite' specified: pre-existing files in /Users/etijskens/software/dev/workspace will be overwritten WITHOUT backup:
[WARNING]        overwriting /Users/etijskens/software/dev/workspace/ET-dot/docs/index.rst

_

2.1 Building binary extensions from Fortran

Binary extension modules based on Fortran are called f2py modules because these modules are build with the f2py tool, which is part of Numpy. Since our project ET-dot now has a package structure, we are now ready to add a f2py module. Let us call this module dotf, where the f stands for Fortran:

> micc add dotf --f2py
[INFO]           [ Adding f2py module dotf to project ET-dot.
[INFO]               - Fortran source in       ET-dot/et_dot/f2py_dotf/dotf.f90.
[INFO]               - Python test code in     ET-dot/tests/test_f2py_dotf.py.
[INFO]               - module documentation in ET-dot/et_dot/f2py_dotf/dotf.rst (in restructuredText format).
[WARNING]            Dependencies added. Run \'poetry update\' to update the project\'s virtual environment.
[INFO]           ] done.

The output tells us where to enter the Fortran source code, the test code and the documentation. Enter the Fortran implementation of the dot product below in the Fortran source file ET-dot/et_dot/f2py_dotf/dotf.f90 (using your favourite editor or an IDE):

function dotf(a,b,n)
  ! Compute the dot product of a and b
  !
    implicit none
  !-------------------------------------------------------------------------------------------------
    integer*4              , intent(in)    :: n
    real*8   , dimension(n), intent(in)    :: a,b
    real*8                                 :: dotf
  !-------------------------------------------------------------------------------------------------
  ! declare local variables
    integer*4 :: i
  !-------------------------------------------------------------------------------------------------
    dotf = 0.
    do i=1,n
        dotf = dotf + a(i) * b(i)
    end do
end function dotf

The output of the micc add dotf --f2py command above also shows a warning:

[WARNING]            Dependencies added. Run `poetry update` to update the project's virtual environment.

Micc is telling you that it added some dependencies to your project. In order to be able to build the binary extension dotf these dependencies must be installed in the virtual environment of our project by running poetry update.

> poetry update
Updating dependencies
Resolving dependencies... (2.5s)

Writing lock file


Package operations: 40 installs, 0 updates, 0 removals

  - Installing certifi (2019.11.28)
  - Installing chardet (3.0.4)
  - Installing idna (2.8)
  - Installing markupsafe (1.1.1)
  - Installing python-dateutil (2.8.1)
  - Installing pytz (2019.3)
  - Installing urllib3 (1.25.7)
  - Installing alabaster (0.7.12)
  - Installing arrow (0.15.4)
  - Installing babel (2.7.0)
  - Installing docutils (0.15.2)
  - Installing imagesize (1.1.0)
  - Installing jinja2 (2.10.3)
  - Installing pygments (2.5.2)
  - Installing requests (2.22.0)
  - Installing snowballstemmer (2.0.0)
  - Installing sphinxcontrib-applehelp (1.0.1)
  - Installing sphinxcontrib-devhelp (1.0.1)
  - Installing sphinxcontrib-htmlhelp (1.0.2)
  - Installing sphinxcontrib-jsmath (1.0.1)
  - Installing sphinxcontrib-qthelp (1.0.2)
  - Installing sphinxcontrib-serializinghtml (1.1.3)
  - Installing binaryornot (0.4.4)
  - Installing click (7.0)
  - Installing future (0.18.2)
  - Installing jinja2-time (0.2.0)
  - Installing pbr (5.4.4)
  - Installing poyo (0.5.0)
  - Installing sphinx (2.2.2)
  - Installing whichcraft (0.6.1)
  - Installing cookiecutter (1.6.0)
  - Installing semantic-version (2.8.3)
  - Installing sphinx-click (2.3.1)
  - Installing sphinx-rtd-theme (0.4.3)
  - Installing tomlkit (0.5.8)
  - Installing walkdir (0.4.1)
  - Installing et-micc (0.10.10)
  - Installing numpy (1.17.4)
  - Installing pybind11 (2.4.3)
  - Installing et-micc-build (0.10.10)

Note from the last lines in the output that micc-build, which is a companion of Micc that encapsulates the machinery that does the hard work of building the binary extensions, depends on pybind11, Numpy, and on micc itself. As a consaequence, micc is now also installed in the projects virtual environment. Therefore, when the project’s virtual environment is activated, the active micc is the one in the project’s virtual environment:

> source .venv/bin/activate
(.venv) > which micc
path/to/ET-dot/.venv/bin/micc
(.venv) >

We might want to increment the minor component of the version string by now:

(.venv) > micc version -m
[INFO]           (ET-dot)> micc version (0.0.7) -> (0.1.0)

The binary extension module can now be built:

(.venv) > micc-build
[INFO] [ Building f2py module dotf in directory '/Users/etijskens/software/dev/workspace/ET-dot/et_dot/f2py_dotf/build_'
...
[DEBUG]          >>> shutil.copyfile( 'dotf.cpython-37m-darwin.so', '/Users/etijskens/software/dev/workspace/ET-dot/et_dot/dotf.cpython-37m-darwin.so' )
[INFO] ] done.
[INFO] Check /Users/etijskens/software/dev/workspace/ET-dot/micc-build-f2py_dotf.log for details.
[INFO] Binary extensions built successfully:
[INFO] - ET-dot/et_dot/dotf.cpython-37m-darwin.so
(.venv) >

This command produces a lot of output, most of which is rather uninteresting - except in the case of errors. At the end is a summary of all binary extensions that have been built, or failed to build. If the source file does not have any syntax errors, you will see a file like dotf.cpython-37m-darwin.so in directory ET-dot/et_dot:

(.venv) > ls -l et_dot
total 8
-rw-r--r--  1 etijskens  staff  720 Dec 13 11:04 __init__.py
drwxr-xr-x  6 etijskens  staff  192 Dec 13 11:12 f2py_dotf/
lrwxr-xr-x  1 etijskens  staff   92 Dec 13 11:12 dotf.cpython-37m-darwin.so@ -> path/to/ET-dot/et_dot/f2py_foo/foo.cpython-37m-darwin.so

Note

The extension of the module dotf.cpython-37m-darwin.so will depend on the Python version you are using, and on youe operating system.

Since our binary extension is built, we can test it. Here is some test code. Enter it in file ET-dot/tests/test_f2py_dotf.py:

# import the binary extension and rename the module locally as f90
import et_dot.dotf as f90
import numpy as np

def test_dotf_aa():
    a = np.array([0,1,2,3,4],dtype=np.float)
    expected = np.dot(a,a)
    a_dotf_a = f90.dotf(a,a)
    assert a_dotf_a==expected

The astute reader will notice the magic that is happening here: a is a numpy array, which is passed as is to our et_dot.dotf.dotf() function in our binary extension. An invisible wrapper function will check the types of the numpy arrays, retrieve pointers to the memory of the numpy arrays and feed those pointers into our Fortran function, the result of which is stored in a Python variable a_dotf_a. If you look carefully at the output of ``micc-build`, you will see information about the wrappers that f2py constructed.

Passing Numpy arrays directly to Fortran routines is extremely productive. Many useful Python packages use numpy for arrays, vectors, matrices, linear algebra, etc. By being able to pass Numpy arrays directly into your own number crunching routines relieves you from conversion between array types. In addition you can do the memory management of your arrays and their initialization in Python.

As you can see we test the outcome of dotf against the outcome of numpy.dot(). We thrust that outcome, but beware that this test may be susceptible to round-off error because the representation of floating point numbers in Numpy and in Fortran may differ slightly.

Here is the outcome of pytest:

> pytest
================================ test session starts =================================
platform darwin -- Python 3.7.4, pytest-4.6.5, py-1.8.0, pluggy-0.13.0
rootdir: /Users/etijskens/software/dev/workspace/ET-dot
collected 8 items

tests/test_et_dot.py .......                                                   [ 87%]
tests/test_f2py_dotf.py .                                                      [100%]

============================== 8 passed in 0.16 seconds ==============================
>

All our tests passed. Of course we can extend the tests in the same way as we dit for the naive Python implementation in the previous tutorial. We leave that as an exercise to the reader.

Increment the version string and produce tag:

(.venv) > micc version -p -t
[INFO]           (ET-dot)> micc version (0.1.0) -> (0.1.1)
[INFO]           Creating git tag v0.1.1 for project ET-dot
[INFO]           Done.

Note

If you put your subroutines and functions inside a Fortran module, as in:

MODULE my_f90_module
  implicit none
  contains
    function dot(a,b)
      ...
    end function dot
END MODULE my_f90_module

then the binary extension module will expose the Fortran module name my_f90_module which in turn exposes the function/subroutine names:

>>> import et_dot
>>> a = [1.,2.,3.]
>>> b = [2.,2.,2.]
>>> et_dot.dot(a,b)
>>> AttributeError
Module et_dot has no attribute 'dot'.
>>> et_dot.my_F90_module.dot(a,b)
12.0

If you are bothered by having to type et_dot.my_F90_module. every time, use this trick:

>>> import et_dot
>>> f90 = et_dot.my_F90_module
>>> f90.dot(a,b)
12.0
>>> fdot = et_dot.my_F90_module.dot
>>> fdot(a,b)
12.0

2.2 Building binary extensions from C++

Note

To add binary extension modules to a project, it must have a package structure. To check, you may run the micc info command:

> micc info
Project ET-dot located at /Users/etijskens/software/dev/workspace/ET-dot
  package: et_dot
  version: 0.0.0
  structure: et_dot/__init__.py (Python package)
  contents:
    f2py module f2py_dotf/dotf.f90

Binary extionsion modules based on C++ are called cpp modules. This time we will call the module dotc where the c stands for C++.

> micc add dotc --cpp
[INFO]           [ Adding cpp module dotc to project ET-dot.
[INFO]               - C++ source in           ET-dot/et_dot/cpp_dotc/dotc.cpp.
[INFO]               - module documentation in ET-dot/et_dot/cpp_dotc/dotc.rst (in restructuredText format).
[INFO]               - Python test code in     ET-dot/tests/test_cpp_dotc.py.
[WARNING]            Dependencies added. Run \'poetry update\' to update the project\'s virtual environment.
[INFO]           ] done.

The output explains you where to add the C++ source code, the test code and the documentation. First take care of the warning:

(.venv) > poetry update
Updating dependencies
Resolving dependencies... (1.7s)

No dependencies to install or update

There is nothing to install, because micc-build was already installed when we added the Fortran module dotf (see 2.1 Building binary extensions from Fortran).

We will be using pybind11 to create Python wrappers for C++ functions. Pybind11 is by far the most practical choice for this (see https://channel9.msdn.com/Events/CPP/CppCon-2016/CppCon-2016-Introduction-to-C-python-extensions-and-embedding-Python-in-C-Apps for a good overview of this topic). It has a lot of ‘automagical’ features, and it has a header-only C++ library - so, thus effectively preventing installation problems. Boost.Python offers very similar features, but is not header-only and its library depends on the python version you want to use - so you need a different library for every Python version you want to use.

Increment the minor component of the version string:

(.venv) > micc version -m
[INFO]           (ET-dot)> micc version (0.1.1) -> (0.2.0)

Enter this code in the C++ source file ET-dot/et_dot/cpp_dotc/dotc.cpp

#include <pybind11/pybind11.h>
#include <pybind11/numpy.h>

double
dotc( pybind11::array_t<double> a
    , pybind11::array_t<double> b
    )
{
    auto bufa = a.request()
       , bufb = b.request()
       ;
 // verify dimensions and shape:
    if( bufa.ndim != 1 || bufb.ndim != 1 ) {
        throw std::runtime_error("Number of dimensions must be one");
    }
    if( (bufa.shape[0] != bufb.shape[0]) ) {
        throw std::runtime_error("Input shapes must match");
    }
 // provide access to raw memory
 // because the Numpy arrays are mutable by default, py::array_t is mutable too.
 // Below we declare the raw C++ arrays for x and y as const to make their intent clear.
    double const *ptra = static_cast<double const *>(bufa.ptr);
    double const *ptrb = static_cast<double const *>(bufb.ptr);

    double d = 0.0;
    for (size_t i = 0; i < bufa.shape[0]; i++)
        d += ptra[i] * ptrb[i];

    return d;
}

// describe what goes in the module
PYBIND11_MODULE(dotc, m)
{// optional module docstring:
    m.doc() = "pybind11 dotc plugin";
 // list the functions you want to expose:
 // m.def("exposed_name", function_pointer, "doc-string for the exposed function");
    m.def("dotc", &dotc, "The dot product of two arrays 'a' and 'b'.");
}

Obviously the C++ source code is more involved than its Fortran equivalent in the previous section. This is because f2py is a program performing clever introspection into the Fortran source code, whereas pybind11 is nothing but a C++ template library. As such it is not capable of introspection and the user is obliged to use pybind11 for accessing the arguments passed in by Python.

Build the module. Because we do not want to rebuild the dotf module we add -m dotc to the command line, to indicate that only module dotc must be built:

(.venv)> micc build -m dotc
 [INFO] [ Building cpp module 'dotc':
 [DEBUG]          [ > cmake -D PYTHON_EXECUTABLE=/Users/etijskens/software/dev/workspace/tmp/ET-dot/.venv/bin/python -D pybind11_DIR=/Users/etijskens/software/dev/workspace/tmp/ET-dot/.venv/lib/python3.7/site-packages/et_micc_build/cmake_tools -D CMAKE_BUILD_TYPE=RELEASE ..
 [DEBUG]              (stdout)
                        -- The CXX compiler identification is AppleClang 11.0.0.11000033
                        -- Check for working CXX compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++
                        -- Check for working CXX compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ -- works
                        -- Detecting CXX compiler ABI info
                        -- Detecting CXX compiler ABI info - done
                        -- Detecting CXX compile features
                        -- Detecting CXX compile features - done
                        -- Found PythonInterp: /Users/etijskens/software/dev/workspace/tmp/ET-dot/.venv/bin/python (found version "3.7.5")
                        -- Found PythonLibs: /Users/etijskens/.pyenv/versions/3.7.5/lib/libpython3.7m.a
                        -- Performing Test HAS_CPP14_FLAG
                        -- Performing Test HAS_CPP14_FLAG - Success
                        -- Performing Test HAS_FLTO
                        -- Performing Test HAS_FLTO - Success
                        -- LTO enabled
                        -- Configuring done
                        -- Generating done
                        -- Build files have been written to: /Users/etijskens/software/dev/workspace/tmp/ET-dot/et_dot/cpp_dotc/_cmake_build
 [DEBUG]          ] done.
 [DEBUG]          [ > make
 [DEBUG]              (stdout)
                        Scanning dependencies of target dotc
                        [ 50%] Building CXX object CMakeFiles/dotc.dir/dotc.cpp.o
                        [100%] Linking CXX shared module dotc.cpython-37m-darwin.so
                        [100%] Built target dotc
 [DEBUG]          ] done.
 [DEBUG]          >>> os.remove(/Users/etijskens/software/dev/workspace/tmp/ET-dot/et_dot/cpp_dotc/dotc.cpython-37m-darwin.so)
 [DEBUG]          >>> shutil.copyfile( '/Users/etijskens/software/dev/workspace/tmp/ET-dot/et_dot/cpp_dotc/_cmake_build/dotc.cpython-37m-darwin.so', '/Users/etijskens/software/dev/workspace/tmp/ET-dot/et_dot/cpp_dotc/dotc.cpython-37m-darwin.so' )
 [DEBUG]          [ > ln -sf /Users/etijskens/software/dev/workspace/tmp/ET-dot/et_dot/cpp_dotc/dotc.cpython-37m-darwin.so /Users/etijskens/software/dev/workspace/tmp/ET-dot/et_dot/cpp_dotc/dotc.cpython-37m-darwin.so
 [DEBUG]          ] done.
 [INFO] ] done.
 [INFO]           Binary extensions built successfully:
 [INFO]           - /Users/etijskens/software/dev/workspace/tmp/ET-dot/et_dot/dotc.cpython-37m-darwin.so
 (.venv)   >

The output shows that first CMake is called, then make, and finally the binary extension is installed with a soft link.

As usual the micc-build command produces a lot of output, most of which is rather uninteresting - except in the case of errors. If the source file does not have any syntax errors, and the build did not experience any problems, you will see a file like dotf.cpython-37m-darwin.so in directory ET-dot/et_dot:

(.venv) > ls -l et_dot
total 8
-rw-r--r--  1 etijskens  staff  1339 Dec 13 14:40 __init__.py
drwxr-xr-x  4 etijskens  staff   128 Dec 13 14:29 __pycache__/
drwxr-xr-x  7 etijskens  staff   224 Dec 13 14:43 cpp_dotc/
lrwxr-xr-x  1 etijskens  staff    93 Dec 13 14:43 dotc.cpython-37m-darwin.so@ -> /Users/etijskens/software/dev/workspace/tmp/ET-dot/et_dot/cpp_dotc/dotc.cpython-37m-darwin.so
lrwxr-xr-x  1 etijskens  staff    94 Dec 13 14:27 dotf.cpython-37m-darwin.so@ -> /Users/etijskens/software/dev/workspace/tmp/ET-dot/et_dot/f2py_dotf/dotf.cpython-37m-darwin.so
drwxr-xr-x  6 etijskens  staff   192 Dec 13 14:43 f2py_dotf/
(.venv) >

Note

The extension of the module dotc.cpython-37m-darwin.so will depend on the Python version you are using, and on the operating system.

Increment the version string:

(.venv) > micc version -p
[INFO]           (ET-dot)> micc version (0.2.0) -> (0.2.1)

Here is the test code. It is almost exactly the same as that for the f2py module dotf, except for the module name. Enter the test code in ET-dot/tests/test_cpp_dotc.py:

# import our binary extension
import et_dot.dotc as cpp
import numpy as np

def test_dotc_aa():
    a = np.array([0,1,2,3,4],dtype=np.float)
    expected = np.dot(a,a)
    a_dotc_a = cpp.dotc(a,a)
    assert a_dotc_a==expected

The conversion between the Numpy arrays to C++ arrays is here less magical, as the user must provide code to do the conversion of Python variables to C++. This has the advantage of showing the mechanics of the conversion more clearly, but it also leaves more space for mistakes, and to beginners it may seem more complicated.

Finally, run pytest:

> pytest
================================ test session starts =================================
platform darwin -- Python 3.7.4, pytest-4.6.5, py-1.8.0, pluggy-0.13.0
rootdir: /Users/etijskens/software/dev/workspace/ET-dot
collected 9 items

tests/test_cpp_dotc.py .                                                       [ 11%]
tests/test_et_dot.py .......                                                   [ 88%]
tests/test_f2py_dotf.py .                                                      [100%]

============================== 9 passed in 0.28 seconds ==============================

All our tests passed.

Increment the version string and tag:

(.venv) > micc version -m -t
[INFO] Creating git tag v0.3.0 for project ET-dot
[INFO] Done.

2.3 Intermediate topics

2.3.1 Binary extension modules and data types

An importand point of attention when writing binary extension modules - and a common source of problems - is that the data types of the variables passed in from Python must match the data types of the Fortran or C++ routines.

Here is a table with the most relevant numeric data types in Python, Fortran and C++.

kind Numpy/Python Fortran C++
unsigned integer uint32 N/A signed long int
unsigned integer uint64 N/A signed long long int
signed integer int32 integer*4 signed long int
signed integer int64 integer*8 signed long long int
floating point float32 real*4 float
floating point float64 real*8 double
complex complex64 complex*4 std::complex<float>
complex complex128 complex*8 std::complex<double>

2.3.2 F2py

F2py is very flexible with respect to data types. In between the Fortran routine and Python call is a wrapper function which translates the function call, and if it detects that the data type on the Python sides and the Fortran sideare different, the wrapper function is allowed to copy/convert the variable when passing it to Fortran routine both, and also when passing the result back from the Fortran routine to the Python caller. When the input/output variables are large arrays copy/conversion operations can have a detrimental effect on performance and this is in HPC highly undesirable. Micc runs f2py with the -DF2PY_REPORT_ON_ARRAY_COPY=1 option. This causes your code to produce a warning everytime the wrapper decides to copy an array. Basically, this warning means that you have to modify your Python data structure to have the same data type as the Fortran source code, or vice versa.

2.3.4 Returning large data structures

The result of a Fortran function and a C++ function is always copied back to the Python variable that will hold it. As copying large data structures is detrimental to performance this shoud be avoided. The solution to this problem is to write Fortran functions or subroutines and C++ functions that accept the result variable as an argument and modify it in place, so that the copy operaton is avoided. Consider this example of a Fortran subroutine that computes the sum of two arrays. are some examples of array addition:

subroutine add(a,b,sumab,n)
  ! Compute the sum of arrays a and b and overwrite array sumab with the result
    implicit none

    integer*4              , intent(in)    :: n
    real*8   , dimension(n), intent(in)    :: a,b
    real*8   , dimension(n), intent(inout) :: sumab

  ! declare local variables
    integer*4 :: i

    do i=1,n
        sumab(i) = a(i) + b(i)
    end do
end subroutine add

The crucial issue here is that the result array sumab has intent(inout). If you qualify the intent of sumab as in you will not be able to overwrite it, whereas - surprisingly - qualifying it with intent(out) will force f2py to consider it as a left hand side variable, which implies copying the result on returning.

The code below does exactly the same but uses a function, not to return the result of the computation, but an error code.

function add(a,b,sumab,n)
  ! Compute the sum of arrays a and b and overwrite array sumab with the result
    implicit none

    integer*4              , intent(in)    :: n,add
    real*8   , dimension(n), intent(in)    :: a,b
    real*8   , dimension(n), intent(inout) :: sumab

  ! declare local variables
    integer*4 :: i

    do i=1,n
        sumab(i) = a(i) + b(i)
    end do

    add = ... ! set return value, e.g. an error code.

end function add

The same can be accomplished in C++:

#include <pybind11/pybind11.h>
#include <pybind11/numpy.h>

namespace py = pybind11;

void
add ( py::array_t<double> a
    , py::array_t<double> b
    , py::array_t<double> sumab
    )
{// request buffer description of the arguments
    auto buf_a = a.request()
       , buf_b = b.request()
       , buf_sumab = sumab.request()
       ;
    if( buf_a.ndim != 1
     || buf_b.ndim != 1
     || buf_sumab.ndim != 1 )
    {
        throw std::runtime_error("Number of dimensions must be one");
    }

    if( (buf_a.shape[0] != buf_b.shape[0])
     || (buf_a.shape[0] != buf_sumab.shape[0]) )
    {
        throw std::runtime_error("Input shapes must match");
    }
 // because the Numpy arrays are mutable by default, py::array_t is mutable too.
 // Below we declare the raw C++ arrays for a and b as const to make their intent clear.
    double const *ptr_a     = static_cast<double const *>(buf_a.ptr);
    double const *ptr_b     = static_cast<double const *>(buf_b.ptr);
    double       *ptr_sumab = static_cast<double       *>(buf_sumab.ptr);

    for (size_t i = 0; i < buf_a.shape[0]; i++)
        ptr_sumab[i] = ptr_a[i] + ptr_b[i];
}


PYBIND11_MODULE({{ cookiecutter.module_name }}, m)
{// optional module doc-string
    m.doc() = "pybind11 {{ cookiecutter.module_name }} plugin"; // optional module docstring
 // list the functions you want to expose:
 // m.def("exposed_name", function_pointer, "doc-string for the exposed function");
    m.def("add", &add, "A function which adds two arrays 'a' and 'b' and stores the result in the third, 'sumab'.");
}

Here, care must be taken that when casting buf_sumab.ptr one does not cast to const.

2.4 Specifying compiler options for binary extension modules

[ Advanced Topic ] As we have seen, binary extension modules can be programmed in Fortran and C++. Micc provides convenient wrappers to build such modules. Fortran source code is transformed to a python module using f2py, and C++ source using Pybind11 and CMake. Obviously, in both cases there is a compiler under the hood doing the hard work. By default these tools use the compiler they find on the path, but you may as well specify your favorite compiler.

2.4.1 Building a single module only

If you want to build a single binary extension module rather than all binary extension modules in the project, add the -m|--module option:

This will only build module my_module.

2.4.2 Performing a clean build

To perform a clean build, add the --clean flag to the micc build command:

This will remove the previous build directory and as well as the binary extension module.

2.4.3 Controlling the build of f2py modules

To specify the Fortran compiler, e.g. the GNU fortran compiler:

Note, that this exactly how you would have specified it using f2py directly. You can specify the Fortran compiler options you want using the --f90flags option:

In addition f2py (and micc build for that matter) provides two extra options --opt for specifying optimization flags, and --arch for specifying architecture dependent optimization flags. These flags can be turned off by adding --noopt and --noarch, respectively. This can be convenient when exploring compile options. Finally, the --debug flag adds debug information during the compilation.

Micc_ build also provides a --build-type options which accepts release and debug as value (case insensitive). Specifying debug is equivalent to --debug --noopt --noarch.

Note

ALL f2py modules are built with the same options. To specify separate options for a particular module use the -m|--module option.

Note

Although there are some commonalities between the compiler options of the various compilers, you will most probably have to change the compiler options when you change the compiler.

2.4.4 Controlling the build of cpp modules

The C++ compiler, e.g. the Intel C++ compiler, is specified as:

Here, the --cxx-compiler’s value is tranferred to the CMake variable CMAKE_CXX_COMPILER.

CMake provides default build options for four build types:

  • CMAKE_CXX_FLAGS_DEBUG     ``: ``-g
  • CMAKE_CXX_FLAGS_MINSIZEREL: -Os -DNDEBUG
  • CMAKE_CXX_FLAGS_RELEASE   ``: ``-O3 -DNDEBUG
  • CMAKE_CXX_FLAGS_RELWITHDEBINFO: -O2 -g -DNDEBUG

You can overwrite their value by specifying --build-type (to select the build type) and --cxx-flags to set the appropriate value. These variables are merged with the CMake variable CMAKE_CXX_FLAGS, which is empty by default. This variable can be overwritten by using the --cxx-flags-all option,

Note

ALL cpp modules are built with the same options. To specify separate options for a particular module use the -m|--module option.

Note

CMake selects reasonable options for the four build types above, taking into account the chosen compiler. For tweeking, however, you will most probably have to change the compiler options when you change the compiler.

2.4.5 Save and load build options to/from file

With the --save option you can save the current build options to a file in .json format. This acts on a per project basis. E.g.:

will save the <my build options> to the file build.json in every binary module directory (the .json extension is added if omitted). You can restrict this to a single module with the --module option (see above). The saved options can be reused in a later build as:

2.5 Documenting binary extension modules

For Python modules the documentation is automatically extracted from the doc-strings in the module. However, when it comes to documenting binary extension modules, this does not seem a good option. Ideally, the source files ET-dot/et_dot/f2py_dotf/dotf.f90 amnd ET-dot/et_dot/cpp_dotc/dotc.cpp should document the Fortran functions and subroutines, and C++ functions, respectively, rahter than the Python interface. Yet from the perspective of ET-dot being a Python project, the users is only interested in the documentation of the Python interface to those functions and subroutines. Therefore, micc requires you to document the Python interface in separate .rst files:

  • ET-dot/et_dot/f2py_dotf/dotf.rst
  • ET-dot/et_dot/cpp_dotc/dotc.rst

Here are the contents, respectively, for ET-dot/et_dot/f2py_dotf/dotf.rst:

Module et_dot.dotf
******************

Module :py:mod:`dotf` built from fortran code in :file:`f2py_dotf/dotf.f90`.

.. function:: dotf(a,b)
   :module: et_dot.dotf

   Compute the dot product of *a* and *b* (in Fortran.)

   :param a: 1D Numpy array with ``dtype=numpy.float64``
   :param b: 1D Numpy array with ``dtype=numpy.float64``
   :returns: the dot product of *a* and *b*
   :rtype: ``numpy.float64``

and for ET-dot/et_dot/cpp_dotc/dotc.rst:

Module et_dot.dotc
******************

Module :py:mod:`dotc` built from fortran code in :file:`cpp_dotc/dotc.cpp`.

.. function:: dotc(a,b)
   :module: et_dot.dotc

   Compute the dot product of *a* and *b* (in C++.)

   :param a: 1D Numpy array with ``dtype=numpy.float64``
   :param b: 1D Numpy array with ``dtype=numpy.float64``
   :returns: the dot product of *a* and *b*
   :rtype: ``numpy.float64``

Note that the documentation must be entirely in .rst format (see restructuredText).

Build the documentation:

(.venv) > cd docs && make html
Already installed: click
Already installed: sphinx-click
Already installed: sphinx
Already installed: sphinx-rtd-theme
Running Sphinx v2.2.2
making output directory... done
WARNING: html_static_path entry '_static' does not exist
building [mo]: targets for 0 po files that are out of date
building [html]: targets for 7 source files that are out of date
updating environment: [new config] 7 added, 0 changed, 0 removed
reading sources... [100%] readme
looking for now-outdated files... none found
pickling environment... done
checking consistency... /Users/etijskens/software/dev/workspace/tmp/ET-dot/docs/apps.rst: WARNING: document isn't included in any toctree
done
preparing documents... done
writing output... [100%] readme
generating indices...  genindex py-modindexdone
highlighting module code... [100%] et_dot.dotc
writing additional pages...  search/Users/etijskens/software/dev/workspace/tmp/ET-dot/.venv/lib/python3.7/site-packages/sphinx_rtd_theme/search.html:20: RemovedInSphinx30Warning: To modify script_files in the theme is deprecated. Please insert a <script> tag directly in your theme instead.
  {{ super() }}
done
copying static files... ... done
copying extra files... done
dumping search index in English (code: en)... done
dumping object inventory... done
build succeeded, 2 warnings.

The HTML pages are in _build/html.

The documentation is built using make. The Makefile checks that the necessary components sphinx, click, sphinx-click_and sphinx-rtd-theme are installed.

You can view the result in your favorite browser:

(.venv) > open _build/html/index.html

The filepath is made evident from the last output line above. This is what the result looks like (html):

_images/img3.png

Increment the version string:

(.venv) > micc version -M -t [ERROR] Not a project directory (/Users/etijskens/software/dev/workspace/tmp/ET-dot/docs). (.venv) > cd .. (.venv) > micc version -M -t [INFO] (ET-dot)> micc version (0.3.0) -> (1.0.0) [INFO] Creating git tag v1.0.0 for project ET-dot [INFO] Done.

Note that we first got an error because we are still in the docs directory, and not in the project root directory.

Tutorial 3: Adding Python components

3.1 Adding a Python module

Just as one can add binary extension modules to a package, one can add python modules.

> micc add foo --py

[INFO]           [ Adding python module foo.py to project ET-dot.
[INFO]               - python source in    ET-doc/et_doc/foo.py.
[INFO]               - Python test code in ET-doc/tests/test_foo.py.
[INFO]           ] done.

This adds a Python sub-module to the package, and a test script. The documentation for the sub-module is extracted from doc-strings of the functions and classes in the sub-module.

As with micc create the default structure is that of a simple module, i.e. ET-doc/et_doc/foo.py. If you want a package you can add the --package flag.

3.1.1 Testing the module

When adding a module foo, Micc automacally adds a test script for the new module: tests/test_foo,py. In this file you add tests for module foo.

3.1.2 Documenting the module

When adding a module foo, Micc automatically adds documentation entries in API.rst. Calling micc docs will automatically extract documentation from the doc-strings in your new module.

3.2 Adding a Python Command Line Interface

Command Line Interfaces are Python scripts that you want to be installed as executable programs when a user installs your package.

As an example, assume that we need quite often to read two arrays from file and compute their dot product, and that we want to execute this operation as:

> dot-files file1 file2
dot(file1,file2) = 123.456
>

Micc supports two kinds of CLIs based on click, a very practical tool for building Python CLIs. The first one is for CLIs that execute a single task, the second one for a command with sub-commands, like git or micc itself. The single task case default, so we can create it like:

> micc app dot-files
[INFO]           [ Adding CLI dot-files without sub-commands to project ET-dot.
[INFO]               - Python source file ET-dot/et_dot/cli_dot-files.py.
[INFO]               - Python test code   ET-dot/tests/test_cli_dot-files.py.
[INFO]           ] done.

For a CLI with sub-commands one should add the flag --sub-commands.

The source code ET-dot/et_dot/cli_dot_files.py should be modified as:

# -*- coding: utf-8 -*-
"""Command line interface dot-files (no sub-commands)."""

import sys

import click
import numpy as np

from et_dot.dotf import dotf

@click.command()
@click.argument('file1')
@click.argument('file2')
@click.option('-v', '--verbosity', count=True
             , help="The verbosity of the CLI."
             , default=1
             )
def main(file1,file2,verbosity):
    """Command line interface dot-files.

    A 'hello' world CLI example.
    """
    a = np.genfromtxt(file1, dtype=np.float64, delimiter=',')
    b = np.genfromtxt(file2, dtype=np.float64, delimiter=',')
    ab = dotf(a,b)
    if verbosity>1:
        print(f"dot-files({file1},{file2}) = {ab}")
    else:
        print(ab)

if __name__ == "__main__":
    sys.exit(main())  # pragma: no cover

Here’s how to use it from the command line (without installing):

> source .venv/bin/activate
(.venv) > cat file1.txt
1,2,3,4,5
> cat file2.txt
2,2,2,2,2
(.venv) > python et_dot/cli_dot_files.py file1.txt file2.txt
30.0
(.venv) > python et_dot/cli_dot_files.py file1.txt file2.txt -vv
dot-files(file1.txt,file2.txt) = 30.0

3.2.1 Testing the application

When you add an a application like dot-files Micc automatically adds a test script tests/test_cli_dot_files.py where you can add your tests. Testing CLIs is a bit more complex than testing modules, but Click provides some tools for Testing click applications. Here is the test code:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

from click.testing import CliRunner

from et_dot.cli_dot_files import main

def test_main():
    runner = CliRunner()
    result = runner.invoke(main, ['file1.txt','file2.txt'])
    print(result.output)
    ab = float(result.output[0:-1])
    assert ab==30.0

Finally, we run pytest:

> pytest
================================= test session starts =================================
platform darwin -- Python 3.7.4, pytest-4.6.5, py-1.8.0, pluggy-0.13.0
rootdir: /Users/etijskens/software/dev/workspace/ET-dot
collected 10 items

tests/test_cli_dot-files.py .                                                   [ 10%]
tests/test_cpp_dotc.py .                                                        [ 20%]
tests/test_et_dot.py .......                                                    [ 90%]
tests/test_f2py_dotf.py .                                                       [100%]

================================== 10 passed in 0.33 seconds ==========================

3.2.2 Documenting an application

When adding a CLI, Micc automatically adds documentation entries for it. in APPS.rst. Calling micc docs will automatically extract documentation from the doc-strings of the command and the :param ...: of the click.argument decorators in these doc-strings, and from the help parameters of the click.option decorators.

Tutorial 4: Version control and version management

Git support

When you create a new project, Micc immediately provides a local git repository for you and commits the initial files Micc set up for you. If you have a github account you can register it in the preferences file ~/.micc/micc.json, using the github_username entry:

{
...
, "github_username"  : {"default":"etijskens"
                       ,"text"   :"your github username"
                       }
...
}

Micc cannot create a remote github repository for you, but if you registered your github username in the preferences file, it will add a remote origin at https://github.com/etijskens/<your_project_name>/, and try to push the files to the github repo. If you have created the remote repository before you create the project, the new project will be immediately pushed onto the remote origin. Otherwise, you get a warning that the remote repository does not yet exist. You can create the remote repository whenever you like and push your work onto the remote repository using the git CLI.

Version management

Version numbers are practical, even for a small software project used only by yourself. For larger projects, certainly when other users start using them, they become indispensable. When giving version numbers to a project, we highly recommend to follow the guidelines Semantic Versioning 2.0. Such a version number consists of Major.minor.patch. According to semantic versioning you should increment the:

  • Major version when you make incompatible API changes,
  • minor version when you add functionality in a backwards compatible manner, and
  • patch version when you make backwards compatible bug fixes.

Micc sets a version number of 0.0.0 when it creates a project, and you can bump the version number at any time with the micc version command.

> micc info
Project ET-dot located at /Users/etijskens/software/dev/workspace/ET-dot
  package: et_dot
  version: 0.0.0
  structure: et_dot/__init__.py (Python package)
  contents:
    application cli_dot_files.py
    C++ module  cpp_dotc/dotc.cpp
    f2py module f2py_dotf/dotf.f90

To bump the patch component:

> micc version
Project (ET-dot version (0.0.0)
> micc version --patch
[INFO]           bumping version (0.0.0) -> (0.0.1)

Again, with the short version of --patch and verbose this time, :

> micc -vv version -p
[DEBUG] start = 2019-10-16 13:18:16.995416
[INFO]           bumping version (0.0.1) -> (0.0.2)
[DEBUG]          . Updated (/Users/etijskens/software/dev/workspace/ET-dot/pyproject.toml)
[DEBUG]          . Updated (/Users/etijskens/software/dev/workspace/ET-dot/et_dot/__init__.py)
[DEBUG] stop  = 2019-10-16 13:18:17.261962
[DEBUG] spent = 0:00:00.266546

Here, you can see that micc updated the version number in ET-dot/pyproject.toml and ET-dot/et_dot/__init__.py.

To bump the minor component use the --minor or -m flag:

> micc version -m
[INFO]           bumping version (0.0.2) -> (0.1.0)

As you can see the patch component is reset to 0.

To bump the major component use the --major or -M flag:

> micc version -M
[INFO]           bumping version (0.1.0) -> (1.0.0)

As you can see the minor component (as well as the patch component) is reset to 0.

The version number has a --tag flag that creates a git tag (see https://git-scm.com/book/en/v2/Git-Basics-Tagging) and trys

> micc -vv version -p --tag
[DEBUG] start = 2019-10-16 13:37:25.026161
[INFO]           bumping version (1.0.1) -> (1.0.2)
[DEBUG]          . Updated (/Users/etijskens/software/dev/workspace/ET-dot/pyproject.toml)
[DEBUG]          . Updated (/Users/etijskens/software/dev/workspace/ET-dot/et_dot/__init__.py)
[INFO]           Creating git tag v1.0.2 for project ET-dot
[DEBUG]          Running 'git tag -a v1.0.2 -m "tag version 1.0.2"'
[DEBUG]
[DEBUG]          Pushing tag v1.0.2 for project ET-dot
[DEBUG]          Running 'git push origin v1.0.2'
[DEBUG]          remote: Repository not found.
                   fatal: repository 'https://github.com/etijskens/ET-dot/' not found
[INFO]           Done.
[DEBUG] stop  = 2019-10-16 13:37:26.101959
[DEBUG] spent = 0:00:01.075798

If you created a remote github repository for your project and registered your github username in the preferences file, the tag is pushed to the remote origin.

Tutorial 5 - Publishing your code

Publishing your code is an easy way to make your code available to other users.

5.1 Publishing to the Python Package Index

For this we rely on poetry. If you do not have a PyPI account, create one and run this command in your project directory, e.g. et-foo:

Note

It is crucial that your project name is not already taken. For this reason, we recommend that

  1. before you create a project that you might want to publish, you check wether your project name is not already taken.
  2. immediately after your project is created, you publish it, as to reserve the name forever.

Now everyone can install the package in his current Python environment as:

> pip install et-foo

5.2 Publishing packages with binary extension modules

Packages with binary extension modules are published in exactly the same way, that is, as a Python-only project. When you pip install a Micc project the package directory will end up in the site-packages directory of the Python environment in which you install. The source code directories of the binary extensions modules are installed with the package, but without the binary extensions themselves. These must be compiled locally. Fortunately that happens automatically, at least if the binary extension were added to the package by Micc. When Micc adds a binary extension to a project, two thing happen:

  • a dependency on micc-build is added to the project, and
  • in the top-level module <package_name>/__init__.py a try-except block is added that tries to import the binary extension and in case of failure (ModuleNotFoundError) will attempt to build it using the machinery provided by micc-build. This will usually succeed, provided the necessary compilers are available.

As an example, let us create a project foo with a binary extension module bar written in C++

> micc -p Foo create
> cd auto-build
> micc add bar --cpp

This creates this Foo/foo/__init__.py:

# -*- coding: utf-8 -*-
"""
Package foo
===========

Top-level package for foo.
"""

__version__ = 0.0.0

try:
    import foo.bar
except ModuleNotFoundError as e:
    # Try to build this binary extension:
    from pathlib import Path
    import click
    from et_micc_build.cli_micc_build import auto_build_binary_extension
    msg = auto_build_binary_extension(Path(__file__).parent, 'bar')
    if not msg:
        import foo.bar
    else:
        click.secho(msg, fg='bright_red')

def hello(who='world'):
    ...

If the first import foo.bar fails, the except block imports the method auto_build_binary_extension() and executes it arguments the path to the package directory :file`Foo/foo` and the name of the binary extension module bar. If the build succeeds, the msg string is empty and foo.bar is imported at last, otherwise the error message msg is printed.

5.3 Providing auto_build_binary_extension() with custom build parameters

The auto-build above will normally use the default build options, corresponding to -O3, which optimizes for speed. As the auto_build_binary_extension() method is called automatically, we have not many options to set build options. The auto_build_binary_extension() method will look for the existence of a file Foo/foo/cpp_bar/build_options.<platform>.json, where <platform> is Darwin, on MACOSX, Linux` on Linux and ``Windows on Windows. If it exists, it should contain a dict with the build options to use.

Note

The build options files are OS specific:

  • On MacOSX : build_options.Darwin.json
  • On Linux : build_options.Linux.json
  • On Windows : build_options.Windows.json

5.3.1 f2py module build option specifications

All options available to the f2py command line application can be entered in the build file specification. Pure flags, like e.g. --noopt, which are present or not, but have no value, are entered in the dictionary with value None. Below are some examples of much used f2py flags.

import json
from pathlib import Path
import platform

f2py = {
    '--f90exec' : 'f90 compiler executable'
    '--f90flags': 'f90 compiler flags'
    '--opt'     : 'f90 compiler optimization flags'
    '--arch'    : 'f90 compiler architecture specific compiler flags'
    '--noopt'   : None # neglect '--opt' contents
    '--noarch'  : None # neglect '--arch' contents
    '--debug'   : None # compile with debugging information
}
module_srcdir_path = Path(project_path) / package_name / f"f2py_{module_name}"
with (module_srcdir_path / f"build_options.{platform.system()}.json").open('w) as f:
    json.dump(f2py, f)

Note

The Python dictionary f2py is written to file in .json format, which is human readable. You can also construct it with an editor.

5.3.2 cpp module build option specifications

For cpp binary extension modules the build tool is CMake. Here, the entries of the build options dict consist of any CMake variable and its desired value.

import json
from pathlib import Path
import platform

cmake = {
    'CMAKE_BUILD_TYPE' : 'RELEASE',
    ...
}
module_srcdir_path = Path(project_path) / package_name / f"cpp_{module_name}"
with (module_srcdir_path / f"build_options.{platform.system()}.json").open('w) as f:
    json.dump(cmake, f)

5.4 Publishing your documentation on readthedocs.org

Publishing your documentation to Readthedocs relieves the users of your code from having to build documentation themselves. Making it happen is very easy. First, make sure the git repository of your code is published on Github.Second, create a Readthedocs account if you do not already have one. Then, go to your Readthedocs page, go to your projects and hit import project. Fill in the fields and every time you push commits to Github its documentation will be rebuild automatically and published.

Note

Sphinx must be able to import your project in order to extract the documentation. If your codes depend on Python modules other than the standard library, this will fail and the documentation will not be built. You can add the necessary dependencies to <your-project>/docs/requirements.txt.

Tutorial 6 - Using conda Python and conda virtual environments

This tutorial is about using micc with conda virtual environments on your local machine.

Here are some reasons to use conda environments:

  • Anaconda is popular because it brings many of the tools used in data science and machine learning with just one install, so it’s great for having short and simple setup.
  • Some Python packages provided by conda are optimized for performance, e.g. Numpy is using Intel MKL (Math Kernel Library) for some of its functionality.
  • Then there is the Intel Python distribution which also uses conda. It provides highly performance optimized packages.

6.1 Miniconda

If you haven’t installed miniconda on your local machine, you can follow the instructions on the miniconda installation page.

Conda Python distributions have their own way of creating and managing virtual environments but the principle is the same (see`Conda tasks <https://conda.io/projects/conda/en/latest/user-guide/tasks/index.html>`_).

Cd to our project:

> cd path/to/ET-dot

and create a virtual conda environment. We choose a different name .cenv, so the two can live next to each other:

> conda create -p ./cenv37 python=3.7

The name chosen is arbitrary of course, but it resembles the .venv we got (by default) with poetry, and the 37 is to distinguish different environments for different Python versions. In fact, also the location, which was specified with -p ./cenv37 is arbitrary, but the project root directory is a familiar place for this and compliant with our earlier approach using virtual environments created with poetry install. Alternatively, you might want to use the environment for other projects too, in which case you might locate it in a different place.

This is the output generated:

Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /Users/etijskens/software/dev/ET-dot/.cenv37

  added / updated specs:
    - python=3.7


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    certifi-2019.11.28         |           py37_0         156 KB
    libcxx-4.0.1               |       hcfea43d_1         947 KB
    libcxxabi-4.0.1            |       hcfea43d_1         350 KB
    libedit-3.1.20181209       |       hb402a30_0         136 KB
    libffi-3.2.1               |       h475c297_4          37 KB
    ncurses-6.1                |       h0a44026_1         732 KB
    openssl-1.1.1d             |       h1de35cc_3         3.4 MB
    pip-19.3.1                 |           py37_0         1.9 MB
    python-3.7.5               |       h359304d_0        18.1 MB
    readline-7.0               |       h1de35cc_5         316 KB
    setuptools-42.0.2          |           py37_0         645 KB
    sqlite-3.30.1              |       ha441bb4_0         2.4 MB
    tk-8.6.8                   |       ha441bb4_0         2.8 MB
    xz-5.2.4                   |       h1de35cc_4         239 KB
    zlib-1.2.11                |       h1de35cc_3          90 KB
    ------------------------------------------------------------
                                           Total:        32.1 MB

The following NEW packages will be INSTALLED:

  ca-certificates    pkgs/main/osx-64::ca-certificates-2019.11.27-0
  certifi            pkgs/main/osx-64::certifi-2019.11.28-py37_0
  libcxx             pkgs/main/osx-64::libcxx-4.0.1-hcfea43d_1
  libcxxabi          pkgs/main/osx-64::libcxxabi-4.0.1-hcfea43d_1
  libedit            pkgs/main/osx-64::libedit-3.1.20181209-hb402a30_0
  libffi             pkgs/main/osx-64::libffi-3.2.1-h475c297_4
  ncurses            pkgs/main/osx-64::ncurses-6.1-h0a44026_1
  openssl            pkgs/main/osx-64::openssl-1.1.1d-h1de35cc_3
  pip                pkgs/main/osx-64::pip-19.3.1-py37_0
  python             pkgs/main/osx-64::python-3.7.5-h359304d_0
  readline           pkgs/main/osx-64::readline-7.0-h1de35cc_5
  setuptools         pkgs/main/osx-64::setuptools-42.0.2-py37_0
  sqlite             pkgs/main/osx-64::sqlite-3.30.1-ha441bb4_0
  tk                 pkgs/main/osx-64::tk-8.6.8-ha441bb4_0
  wheel              pkgs/main/osx-64::wheel-0.33.6-py37_0
  xz                 pkgs/main/osx-64::xz-5.2.4-h1de35cc_4
  zlib               pkgs/main/osx-64::zlib-1.2.11-h1de35cc_3


Proceed ([y]/n)? y


Downloading and Extracting Packages
readline-7.0         | 316 KB    | ##################################### | 100%
libffi-3.2.1         | 37 KB     | ##################################### | 100%
pip-19.3.1           | 1.9 MB    | ##################################### | 100%
sqlite-3.30.1        | 2.4 MB    | ##################################### | 100%
zlib-1.2.11          | 90 KB     | ##################################### | 100%
libedit-3.1.20181209 | 136 KB    | ##################################### | 100%
xz-5.2.4             | 239 KB    | ##################################### | 100%
setuptools-42.0.2    | 645 KB    | ##################################### | 100%
libcxx-4.0.1         | 947 KB    | ##################################### | 100%
tk-8.6.8             | 2.8 MB    | ##################################### | 100%
python-3.7.5         | 18.1 MB   | ##################################### | 100%
certifi-2019.11.28   | 156 KB    | ##################################### | 100%
openssl-1.1.1d       | 3.4 MB    | ##################################### | 100%
ncurses-6.1          | 732 KB    | ##################################### | 100%
libcxxabi-4.0.1      | 350 KB    | ##################################### | 100%
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
#     $ conda activate /Users/etijskens/software/dev/ET-dot/.cenv37
#
# To deactivate an active environment, use
#
#     $ conda deactivate

As mentioned at the end, we can activate the environment with the command:

> conda activate /Users/etijskens/software/dev/ET-dot/.cenv37
> (/Users/etijskens/software/dev/ET-dot/.cenv37)

Note

The command conda activate .cenv37/ would have worked too, but not conda activate .cenv37, as conda will consider .cenv37 to be a named environment (an environment created with conda create --name <envname> and look it up in its default directory.

Conda provides hundreds of popular packages, which are often better optimised than the general purpose packages on PyPI. You install them using conda install:

> conda install numpy
Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /Users/etijskens/software/dev/workspace/ET-dot/.cenv37

  added / updated specs:
    - numpy


The following NEW packages will be INSTALLED:

  blas               pkgs/main/osx-64::blas-1.0-mkl
  intel-openmp       pkgs/main/osx-64::intel-openmp-2019.4-233
  libgfortran        pkgs/main/osx-64::libgfortran-3.0.1-h93005f0_2
  mkl                pkgs/main/osx-64::mkl-2019.4-233
  mkl-service        pkgs/main/osx-64::mkl-service-2.3.0-py37hfbe908c_0
  mkl_fft            pkgs/main/osx-64::mkl_fft-1.0.15-py37h5e564d8_0
  mkl_random         pkgs/main/osx-64::mkl_random-1.1.0-py37ha771720_0
  numpy              pkgs/main/osx-64::numpy-1.17.4-py37h890c691_0
  numpy-base         pkgs/main/osx-64::numpy-base-1.17.4-py37h6575580_0
  six                pkgs/main/osx-64::six-1.13.0-py37_0


Proceed ([y]/n)? y

Preparing transaction: done
Verifying transaction: done
Executing transaction: done

Clearly, this numpy adds some performance optimized components from Intel like blas, intel-openmp, mkl etc. It is important to use conda install for such packages as pip install or poetry install would install different a different Numpy.

Finally, we run poetry install to install the remaining dependencies (we remove poetsry.lock to allow poetry to choose the most recent version):

(/Users/etijskens/software/dev/workspace/ET-dot/.cenv37) > rm poetry.lock
(/Users/etijskens/software/dev/workspace/ET-dot/.cenv37) > poetry install
Updating dependencies
Resolving dependencies... (2.4s)

Writing lock file


Package operations: 49 installs, 0 updates, 0 removals

  - Installing chardet (3.0.4)
  - Installing idna (2.8)
  - Installing markupsafe (1.1.1)
  - Installing pyparsing (2.4.5)
  - Installing python-dateutil (2.8.1)
  - Installing pytz (2019.3)
  - Installing urllib3 (1.25.7)
  - Installing alabaster (0.7.12)
  - Installing arrow (0.15.4)
  - Installing babel (2.7.0)
  - Installing docutils (0.15.2)
  - Installing imagesize (1.1.0)
  - Installing jinja2 (2.10.3)
  - Installing more-itertools (8.0.2)
  - Installing packaging (19.2)
  - Installing pygments (2.5.2)
  - Installing requests (2.22.0)
  - Installing snowballstemmer (2.0.0)
  - Installing sphinxcontrib-applehelp (1.0.1)
  - Installing sphinxcontrib-devhelp (1.0.1)
  - Installing sphinxcontrib-htmlhelp (1.0.2)
  - Installing sphinxcontrib-jsmath (1.0.1)
  - Installing sphinxcontrib-qthelp (1.0.2)
  - Installing sphinxcontrib-serializinghtml (1.1.3)
  - Installing binaryornot (0.4.4)
  - Installing click (7.0)
  - Installing future (0.18.2)
  - Installing jinja2-time (0.2.0)
  - Installing pbr (5.4.4)
  - Installing poyo (0.5.0)
  - Installing sphinx (2.3.0)
  - Installing whichcraft (0.6.1)
  - Installing zipp (0.6.0)
  - Installing cookiecutter (1.6.0)
  - Installing importlib-metadata (1.3.0)
  - Installing semantic-version (2.8.3)
  - Installing sphinx-click (2.3.1)
  - Installing sphinx-rtd-theme (0.4.3)
  - Installing tomlkit (0.5.8)
  - Installing walkdir (0.4.1)
  - Installing atomicwrites (1.3.0)
  - Installing attrs (19.3.0)
  - Installing et-micc (0.10.13)
  - Installing pluggy (0.13.1)
  - Installing py (1.8.0)
  - Installing pybind11 (2.4.3)
  - Installing wcwidth (0.1.7)
  - Installing et-micc-build (0.10.13)
  - Installing pytest (4.6.8)
  - Installing ET-dot (1.0.0)

Clearly, Numpy is not in the install list. The numpy we installed with conda is still available:

(/Users/etijskens/software/dev/workspace/ET-dot/.cenv37) > conda list # packages in environment at /Users/etijskens/software/dev/workspace/ET-dot/.cenv37: # # Name Version Build Channel … et-dot 1.0.0 dev_0 <develop> et-micc 0.10.13 pypi_0 pypi et-micc-build 0.10.13 pypi_0 pypi … intel-openmp 2019.4 233 … libgfortran 3.0.1 h93005f0_2 … mkl 2019.4 233 mkl-service 2.3.0 py37hfbe908c_0 mkl_fft 1.0.15 py37h5e564d8_0 mkl_random 1.1.0 py37ha771720_0 … numpy 1.17.4 py37h890c691_0 numpy-base 1.17.4 py37h6575580_0 …

Notice the last Channel column, which describes from where the packages come. The pypi entries where installed from PyPI during the poetry install command. The <develop> entry refers our current project ET-dot which was installed in ‘development’ mode, meaning that modification to the .py files are immediately seen by the environment.

Run pytest to verify that everything is working fine:

(/Users/etijskens/software/dev/workspace/ET-dot/.cenv37) > python -m pytest
========================================= test session starts ==========================================
platform darwin -- Python 3.7.5, pytest-4.6.8, py-1.8.0, pluggy-0.13.1
rootdir: /Users/etijskens/software/dev/workspace/ET-dot
collected 9 items

tests/test_cpp_dotc.py .                                                                         [ 11%]
tests/test_et_dot.py .......                                                                     [ 88%]
tests/test_f2py_dotf.py .                                                                        [100%]

=========================================== warnings summary ===========================================
.cenv37/lib/python3.7/site-packages/cookiecutter/repository.py:19
  /Users/etijskens/software/dev/workspace/ET-dot/.cenv37/lib/python3.7/site-packages/cookiecutter/repository.py:19: DeprecationWarning: Flags not at the start of the expression '\n(?x)\n((((git|hg)\\+)' (truncated)
    """)

-- Docs: https://docs.pytest.org/en/latest/warnings.html
================================ 9 passed, 1 warnings in 23.77 seconds =================================

This was all run in a fresh git clone of ET-dot, without the binary extensions. That there are no errors implies that the auto-build feature was succesfully engaged to build the binary extensions et_dot/dotf and et_dot/dotc.

Note

Poetry always uses pip for its installs, even in a conda environment. This may perhaps change in the future, as Poetry evolves, but for the time being it is the user’s responsibility to conda install the modules he needs from the conda ecosystem.

6.2 Intel distribution for Python

The Intel Python distribution is also based on conda. It contains many popular packages for high performance computing, data analytics, machine learning and artificial intelligence. The 2020 release announces:

  • Faster machine learning with scikit-learn key algorithms accelerated with Intel DAAL
  • Help address the needs of data scientists to harness Intel DAAL capabilities with a Python API using daal4py package improvements
  • Speed up pandas and NumPy with a compiler-based framework: High Performance Analytics Toolkit (HPAT)
  • Includes the latest TensorFlow and Caffe libraries that are optimized for Intel® architecture

To create a conda environment for the Intel distribution for Python follow these instructions:

Cd into your project root directory:

> cd path/to/ET-dot

and create the environment:

> conda create -p ./.idp -c intel intelpython3_core python=3 Collecting package metadata (current_repodata.json): done Solving environment: done

## Package Plan ##

environment location: /Users/etijskens/software/dev/workspace/ET-dot/.idp

added / updated specs:
  • intelpython3_core
  • python=3

The following NEW packages will be INSTALLED:

bzip2 intel/osx-64::bzip2-1.0.8-0 certifi intel/osx-64::certifi-2019.9.11-py37_0 icc_rt intel/osx-64::icc_rt-2020.0-intel_166 intel-openmp intel/osx-64::intel-openmp-2020.0-intel_166 intelpython intel/osx-64::intelpython-2020.0-1 intelpython3_core intel/osx-64::intelpython3_core-2020.0-0 libffi intel/osx-64::libffi-3.2.1-11 mkl intel/osx-64::mkl-2020.0-intel_166 mkl-service intel/osx-64::mkl-service-2.3.0-py37_0 mkl_fft intel/osx-64::mkl_fft-1.0.15-py37ha68da19_3 mkl_random intel/osx-64::mkl_random-1.1.0-py37ha68da19_0 numpy intel/osx-64::numpy-1.17.4-py37ha68da19_4 numpy-base intel/osx-64::numpy-base-1.17.4-py37_4 openssl intel/osx-64::openssl-1.1.1d-0 pip intel/osx-64::pip-19.1.1-py37_0 python intel/osx-64::python-3.7.4-3 pyyaml intel/osx-64::pyyaml-5.1.1-py37_0 scipy intel/osx-64::scipy-1.3.2-py37ha68da19_0 setuptools intel/osx-64::setuptools-41.0.1-py37_0 six intel/osx-64::six-1.12.0-py37_0 sqlite intel/osx-64::sqlite-3.29.0-0 tbb intel/osx-64::tbb-2020.0-intel_166 tbb4py intel/osx-64::tbb4py-2020.0-py37_intel_0 tcl intel/osx-64::tcl-8.6.4-24 tk intel/osx-64::tk-8.6.4-29 wheel intel/osx-64::wheel-0.31.0-py37_3 xz intel/osx-64::xz-5.2.4-h1de35cc_7 yaml intel/osx-64::yaml-0.1.7-2 zlib intel/osx-64::zlib-1.2.11-h1de35cc_7

Proceed ([y]/n)? y

Preparing transaction: done Verifying transaction: done Executing transaction: done # # To activate this environment, use # # $ conda activate /Users/etijskens/software/dev/workspace/ET-dot/.idp # # To deactivate an active environment, use # # $ conda deactivate

Note

If you haven’t installed a conda Python distribution before, the fastest way to obtain conda is to install Miniconda.

As before, you can now activate the environment:

> conda activate .idp/
(/Users/etijskens/software/dev/workspace/ET-dot/.idp) >

We do not recommend to use poetry install to install the project`s dependencies. (The Intel distribution for Python, apparently, uses distutils instead of pip for its distributions, wich causes problems). Rather, install them manually:

(/Users/etijskens/software/dev/workspace/ET-dot/.idp) > pip install et-micc-build
...
(/Users/etijskens/software/dev/workspace/ET-dot/.idp) > pip install pytest
...

Finally, run the tests:

> python -m pytest ============================= test session starts ============================== platform darwin – Python 3.7.4, pytest-5.3.2, py-1.8.0, pluggy-0.13.1 rootdir: /Users/etijskens/software/dev/workspace/ET-dot collected 9 items

tests/test_cpp_dotc.py . [ 11%] tests/test_et_dot.py ……. [ 88%] tests/test_f2py_dotf.py . [100%]

============================== 9 passed in 4.50s ===============================

Tutorial 7 - Using micc projects on the VSC-clusters

We distinguish to cases:

  • installing a micc-project for further development, and
  • installing a micc-project (in a virtual environment) for use in production runs.

Note

This tutorial uses the Leibniz cluster of the University of Antwerp as an example. The principles pertain, however, to all VSC clusters, and most probably also to other clusters using a module system for exposing its software stack.

7.1 Micc use on the cluster for developing code

Most differences between using your local machine and using the cluster stem from the fact that the cluster uses a module system for making software available to the user, and less importantly, that the cluster uses a scheduler to run your compute jobs in batch mode when the hardware you requested is available.

Most tools that are commonly used on the cluster are built for optimal performance and pre-installed on the cluster. You need to make them available for execution by module load commands (for all the details see Using the module system). Although the operating system also exposes some tools such as compilers, as they are many versions behind and, consequentially, they are not fit for high performance computing. As an example consider the git command. This is the git version exposed by the operating system:

> which git
/usr/bin/git
> git --version
git version 1.8.3.1

When you load the git module you get version 2.13.3:

> module load git
> which git
/apps/antwerpen/broadwell/centos7/git/2.13.3/bin/git

Though this is not the very latest git version, but it is definitely way ahead of 1.8.3.1. Moreover, both versions differ in the major component of the version, which indicates that they are not backward compatible.

As git is now available, we can clone the git repository of our ET-dot project in some workspace directory (preferably somewhere on $VSC_DATA) and cd into the project directory:

> cd $VSC_DATA/path/to/my/workspace
> git clone https://github.com/etijskens/ET-dot
Cloning into 'ET-dot'...
remote: Enumerating objects: 116, done.
remote: Counting objects: 100% (116/116), done.
remote: Compressing objects: 100% (74/74), done.
remote: Total 116 (delta 45), reused 100 (delta 29), pack-reused 0
Receiving objects: 100% (116/116), 29.90 KiB | 0 bytes/s, done.
Resolving deltas: 100% (45/45), done.
> cd ET-dot

Note

It is good practice to clone git repositories in $VSC_DATA. Doing this in $VSC_HOME can easily consume all your file quota, and $VSC_SCRATCH is not backed up.

You will need also to load CMake if you want to build binary extension modules from C++ source code as the dotc module:

> module load CMake

On our local machine we would now select a python version with pyenv, and run poetry install to create a virtual environment and install ET-dot’s dependencies. The pyenv part is again replaced by a module load command, e.g.:

> module load leibniz/2019b
> module load Python/3.7.4-intel-2019b

The first command selects all modules built with the Intel 2019b toolchain, and the second makes Python 3.7.4 available together with a whole bunch of pre-installed Python packages which are useful for high performance computing, such as numpy, as well as all their dependencies. To see them execute:

> pip list
Package            Version
------------------ ------------
absl-py            0.7.1
alabaster          0.7.12
appdirs            1.4.3
...
numpy              1.17.0
...

or:

> conda list
???

The poetry part, requires - at least at the time of writing - some special attention.

7.1.1 Note about using Poetry on the cluster

On our local machine we used poetry for

  • virtual environment creation and management,
  • installation of dependencies in a project’s virtual environment, using the commands
    • poetry install,
    • poetry update,
    • poetry add and
    • poetry remove,
  • for publishing to PyPi, with command poetry publish.

We do not recommend using Poetry for installing dependencies on the cluster. The main reason for this is that poetry masks any pre-installed Python packages that are made available by the cluster software stack. Every Python distribution on the cluster comes with a such set of pre-installed packages that are important for high performance computing, and are built (compiled) to squeeze out the last bit of performance out of the hardware on which they will run. Typical examples are Numpy, Scipy, pandas, … Poetry install will install equally functional packages which are built for running on many different hardwares, rather than for optimal performance. By using poetry install performances will be sacrificed. In addition, re-installing these packages consumes a lot of your file quota.

To avoid trouble, we thus recommend to not install poetry on the cluster. If you want to publish your package, commit the changes to the git repository, push them to github, fetch the latest version on your local machine and use poetry publish --build to publish.

7.1.2 Virtual environments and dependencies on the cluster

If we can’t use Poetry for creating virtual environments and installing dependencies, we need some alternative way to achieve this. Fortunately, just doing this by hand is not too difficult.

Creating a virtual environment in the project root directory is simple:

> python -m venv .venv --system-site-packages

This command uses the venv package to create a virtual environment named .venv. The --system-site-packages flag ensures that the virtual environment also sees all the pre-installed Python packages. The environment name is in fact arbitrary, but we choose to use the same name as Poetry would use. The environment name is also the name of the directory containing the virtual environment:

> tree .venv
.venv
├── bin
│   ├── activate
│   ├── activate.csh
│   ├── activate.fish
│   ├── easy_install
│   ├── easy_install-3.7
│   ├── pip
│   ├── pip3
│   ├── pip3.7
│   ├── python -> /apps/antwerpen/broadwell/centos7/Python/3.7.4-intel-2019b/bin/python
│   └── python3 -> python
├── include
├── lib
│   └── python3.7
│       └── site-packages
│           ├── easy_install.py
│           ├── pip
│           │   ├── __init__.py
│           │   ├──

This virtual environment can be activated by executing:

> source .venv/bin/activate
(.venv) >

As on our local machine the command prompt contains a small notice as to the activated virtual environment. If in doubt you can always inspect the full path of the python executable:

(.venv) > which python
/data/antwerpen/201/vsc2017/workspace/ET-dot/.venv/bin/python

To install the dependencies needed by the ET-dot project, we have two options, a quick and dirty approach and a systematic approach. Let’s be systematic first, and checking the [tool.poetry.dependencies] section of the project’s pyproject.toml file,

(.venv) > cat pyproject.toml
...
[tool.poetry.dependencies]
python = "^3.7"
et-micc-build = "^0.10.10"

[tool.poetry.dev-dependencies]
pytest = "^4.4.2"

...

The [tool.poetry.dependencies] section tells us that the our project depends on micc-build, so we install it with pip, which is the standard Python install tool:

(.venv) > pip install et-micc-build
Collecting et-micc-build
  Downloading https://files.pythonhosted.org/packages/aa/00/d95e6cf3b584c1921655258ed4d5a51120ba0ad158e6ee9c0122b2ccd0b2/et_micc_build-0.10.11-py3-none-any.whl
...

As we did not specify a version, it will install the latest version of micc-build as well as all its dependencies, but contrary to poetry install, it will only install packages for which the version specification is not met. E.g. the system site packages of the Python/3.7.4-intel-2019b module contain Numpy 1.17.0 which satisfies the version specification by micc-build and thus Numpy is not installed, as is clear from the output:

...
Requirement already satisfied: numpy<2.0.0,>=1.17.0 in /apps/antwerpen/broadwell/centos7/Python/3.7.4-intel-2019b/lib/python3.7/site-packages/numpy-1.17.0-py3.7-linux-x86_64.egg (from et-micc-build) (1.17.0)
...

This is exactly the behavior we were looking for to avoid masking the system site packages.

An interesting side effect is that, since micc is a dependency of micc-build, micc is now installed in our virtual environment, and thus can be used to assist the further development of the project:

(.venv) > which micc
/data/antwerpen/201/vsc20170/workspace/ET-dot/.venv/bin/micc
(.venv) > micc --version
micc, version 0.10.11

As micc-build is the only dependency, we can verify that everything works fine by running pytest:

(.venv) > python -m pytest

Note

just running pytest will fail because then pytest cannot see our virtual environment and will fail to import et_dot.

Here is the result:

========================================== test session starts ==========================================
platform linux -- Python 3.7.4, pytest-5.0.1, py-1.8.0, pluggy-0.12.0
rootdir: /data/antwerpen/201/vsc20170/workspace/ET-dot
plugins: xonsh-0.9.9
collected 9 items

tests/test_cpp_dotc.py .                                                                          [ 11%]
tests/test_et_dot.py .......                                                                      [ 88%]
tests/test_f2py_dotf.py .                                                                         [100%]

=========================================== warnings summary ============================================
/apps/antwerpen/broadwell/centos7/Python/3.7.4-intel-2019b/lib/python3.7/site-packages/future-0.17.1-py3.7.egg/past/translation/__init__.py:35
  /apps/antwerpen/broadwell/centos7/Python/3.7.4-intel-2019b/lib/python3.7/site-packages/future-0.17.1-py3.7.egg/past/translation/__init__.py:35: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
    import imp

/apps/antwerpen/broadwell/centos7/Python/3.7.4-intel-2019b/lib/python3.7/site-packages/future-0.17.1-py3.7.egg/past/types/oldstr.py:5
  /apps/antwerpen/broadwell/centos7/Python/3.7.4-intel-2019b/lib/python3.7/site-packages/future-0.17.1-py3.7.egg/past/types/oldstr.py:5: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
    from collections import Iterable

/apps/antwerpen/broadwell/centos7/Python/3.7.4-intel-2019b/lib/python3.7/site-packages/future-0.17.1-py3.7.egg/past/builtins/misc.py:4
  /apps/antwerpen/broadwell/centos7/Python/3.7.4-intel-2019b/lib/python3.7/site-packages/future-0.17.1-py3.7.egg/past/builtins/misc.py:4: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
    from collections import Mapping

.venv/lib/python3.7/site-packages/cookiecutter/repository.py:19
  /data/antwerpen/201/vsc20170/workspace/ET-dot/.venv/lib/python3.7/site-packages/cookiecutter/repository.py:19: DeprecationWarning: Flags not at the start of the expression '\n(?x)\n((((git|hg)\\+)' (truncated)
    """)

-- Docs: https://docs.pytest.org/en/latest/warnings.html
================================= 9 passed, 4 warnings in 11.04 seconds =================================

Except for some DeprecationWarning warnings which are out of our reach, all tests succeed. Note, however, that if we hadn’t loaded the CMake module, building the dotc binary extension would fail with and error telling that CMake cannot be found.

The second, quick and dirty approach, avoids checking the project’s pyproject.toml file and runs python -m pytest right away, which (if we hadn’t already installed micc-build) would fail all three tests:

> python -m pytest
========================================== test session starts ==========================================
platform linux -- Python 3.7.4, pytest-5.0.1, py-1.8.0, pluggy-0.12.0
rootdir: /data/antwerpen/201/vsc20170/workspace/ET-dot
plugins: xonsh-0.9.9
collected 0 items / 3 errors

================================================ ERRORS =================================================
________________________________ ERROR collecting tests/test_cpp_dotc.py ________________________________
ImportError while importing test module '/data/antwerpen/201/vsc20170/workspace/ET-dot/tests/test_cpp_dotc.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
et_dot/__init__.py:10: in <module>
    import et_dot.dotc
E   ModuleNotFoundError: No module named 'et_dot.dotc'

During handling of the above exception, another exception occurred:
tests/test_cpp_dotc.py:9: in <module>
    import et_dot.dotc as cpp
et_dot/__init__.py:15: in <module>
    from et_micc_build.cli_micc_build import auto_build_binary_extension
E   ModuleNotFoundError: No module named 'et_micc_build'
_________________________________ ERROR collecting tests/test_et_dot.py _________________________________
ImportError while importing test module '/data/antwerpen/201/vsc20170/workspace/ET-dot/tests/test_et_dot.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
et_dot/__init__.py:10: in <module>
    import et_dot.dotc
E   ModuleNotFoundError: No module named 'et_dot.dotc'

During handling of the above exception, another exception occurred:
tests/test_et_dot.py:10: in <module>
    import et_dot
et_dot/__init__.py:15: in <module>
    from et_micc_build.cli_micc_build import auto_build_binary_extension
E   ModuleNotFoundError: No module named 'et_micc_build'
_______________________________ ERROR collecting tests/test_f2py_dotf.py ________________________________
ImportError while importing test module '/data/antwerpen/201/vsc20170/workspace/ET-dot/tests/test_f2py_dotf.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
et_dot/__init__.py:10: in <module>
    import et_dot.dotc
E   ModuleNotFoundError: No module named 'et_dot.dotc'

During handling of the above exception, another exception occurred:
tests/test_f2py_dotf.py:8: in <module>
    import et_dot.dotf as f90
et_dot/__init__.py:15: in <module>
    from et_micc_build.cli_micc_build import auto_build_binary_extension
E   ModuleNotFoundError: No module named 'et_micc_build'
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 3 errors during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
======================================== 3 error in 0.34 seconds ========================================

All three tests fail in more or less the same way. E.g in the last test there is first a ModuleNotFoundError:

E   ModuleNotFoundError: No module named 'et_dot.dotc'

which tells us that the binary extension dotc is not found. This is logical because it hasn’t been built. (You can verify that there are no .so files by running ls -l et_dot.) The auto-build feature should normally take care of that. The error gives rise to another ModuleNotFoundError:

E   ModuleNotFoundError: No module named 'et_micc_build'

which tells us that micc-build is not installed in our virtual environment, which is indeed necessary for engaging the auto-build feature. So we pip install it:

(.venv) > pip install et-micc-build
Collecting et-micc-build
...

and run the tests again to see that they succeed, meaning that the binary modules were built, and that the auto-build feature was successfully engaged.

If the project needs other packages, you would continue to have ModuleNotFoundError exceptions. Each time you] pip install the missing package, and run the test until no more ModuleNotFoundError exceptions arise and you are good to go.

A bash script for creating and activating the virtual environment may be practical, e.g. micc-setup, stored in some directory which is on your system PATH:

#!/bin/bash
# This is file micc-setup

# load the modules needed
module load leibniz/2019b
module load Python/3.7.4-intel-2019b
module load CMake
module list

if [ -d  ".venv" ]
then
    echo "Virtual environment present: '.venv'"
    echo "Activating '.venv' ..."
    source .venv/bin/activate
else
    # create new virtual environment
    python -m venv .venv --system-site-packages
    source .venv/bin/activate
    pip install et-micc
fi

If most of your projects have binary extensions, you might choose to pip install et-micc-build on the second but last line. When run in the project root directory, this script loads the needed modules and activates the project’s virtual environment .venv if it exists, and, otherwise, create it and install micc. The dependencies of the project you must install yourself.

You must source this script in the project root directory. If you do not source the script, the environment will be correctly setup, but the virtual environment will not be activated when after the script terminates, nor will the modules be loaded:

> cd path/to/ET-dot
> source micc-setup

Currently Loaded Modules:
  1) leibniz/2019b                  9) SQLite/3.29.0-intel-2019b
  2) GCCcore/8.3.0                 10) HDF5/1.8.21-intel-2019b-MPI
  3) binutils/2.32-GCCcore-8.3.0   11) METIS/5.1.0-intel-2019b-i32-fp64
  4) intel/2019b                   12) SuiteSparse/4.5.6-intel-2019b-METIS-5.1.0
  5) baselibs/2019b-GCCcore-8.3.0  13) Python/3.7.4-intel-2019b
  6) Tcl/8.6.9-intel-2019b         14) git/2.13.3
  7) X11/2019b-GCCcore-8.3.0       15) CMake/3.11.1
  8) Tk/8.6.9-intel-2019b
Virtual environment present: '.venv'
Activating '.venv' ...
(.venv) >

This micc-setup script work for every project, but the modules loaded are hardcoded. You can of course elaborate on this very simple script.

7.2 Using a micc project as a dependency

To use a micc project such as ET-dot in an other project, say foo, is simple. Create a virtual environment in foo and use pip install. Using the micc-setup script whe wrote before:

> cd path/to/foo
> source micc-setup

The following have been reloaded with a version change:
  1) leibniz/supported => leibniz/2019b


Currently Loaded Modules:
  1) leibniz/2019b
  2) GCCcore/8.3.0
  3) binutils/2.32-GCCcore-8.3.0
  4) intel/2019b
  5) baselibs/2019b-GCCcore-8.3.0
  6) Tcl/8.6.9-intel-2019b
  7) X11/2019b-GCCcore-8.3.0
  8) Tk/8.6.9-intel-2019b
  9) SQLite/3.29.0-intel-2019b
 10) HDF5/1.8.21-intel-2019b-MPI
 11) METIS/5.1.0-intel-2019b-i32-fp64
 12) SuiteSparse/4.5.6-intel-2019b-METIS-5.1.0
 13) Python/3.7.4-intel-2019b
 14) git/2.13.3
 15) CMake/3.11.1
Creating  new virtual environment '.venv'
Activating '.venv' ...
Installing micc ...
Collecting et-micc
  ...
(.venv) > pip install git+https://github.com/etijskens/ET-dot
Collecting git+https://github.com/etijskens/ET-dot
  Cloning https://github.com/etijskens/ET-dot to /tmp/pip-req-build-i1ta63e3
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Installing backend dependencies ... done
    Preparing wheel metadata ... done
Collecting et-micc-build<0.11.0,>=0.10.10 (from et-dot==1.0.0)
  ...

Note that we installed ET-dot directly from github. If we had published it to PyPi, pip install ET-dot would have been sufficient.

7.2.1 Using virtual environments in batch jobs

Using project foo in a batch job is exactly the same as on the command line. You must load the cluster modules you need, and activate the environment. Here is an example (PBS) job script, assuming that foo.py is a python script that imports et_dot

#!/usr/bin/env bash
#PBS -l nodes=1:ppn=1
#PBS -l walltime=00:05:00
#PBS -l pmem=1gb

cd $VSC_DATA/path/to/foo
# load necessary cluster modules and activate virtual environment
source micc-setup
# run python script
python foo.py

7.3 Using conda Python distributions

You can set up your own Conda virtual environments on the cluster, just as we described in Tutorial 6 - Using conda python and conda virtual environments. The problem with that approach is that it consumes a lot of your file quota due to the fact that it relies much more on copies than the Python venv module. For that reason we do not recommend it. If you, nevertheless, use this approach, make sure you set this up in the $VSC_DATA file space, because if you do it in the $VSC_HOME file space, you will probably run out of file quota before the virtual environment is ready.

There is, however, an alternative method which uses the PYTHONPATH environment variable to extend the IntelPython3 cluster modules. It is a bit of a low-level hack, but it is not overly complicated, and works well.

First, we select the toolchain:

> module load leibniz/2019b
The following have been reloaded with a version change:
  1) leibniz/supported => leibniz/2019b

Then we load an IntelPython version (which is a conda distribution optimized by Intel):

> module load IntelPython3/2019b.05
> python --version
Python 3.6.9 :: Intel Corporation

As usual it comes with a whole bu of pre-installed Python packages:

> conda list
# packages in environment at /apps/antwerpen/x86_64/centos7/intel-psxe/2019_update5/intelpython3:
#
asn1crypto                0.24.0                   py36_3    intel
bzip2                     1.0.6                        18    intel
certifi                   2018.1.18                py36_2    intel
cffi                      1.11.5                   py36_3    intel
chardet                   3.0.4                    py36_3    intel
conda                     4.3.31                   py36_3    intel
...

Cd into our project’s root directory:

> cd $VSC_DATA/workspace/ET-dot

Here we create a directory that will serve as a surrogate for the a virtual environment:

> mkdir .cenv

The name chosens is arbitrary of course, but it resembles the .venv we had above when using the venv Python package. In fact, also the location is arbitrary, but the project root directory is a familiar place for this.

Next, we use pip to install et-micc-build into .cenv:

> pip install -t .cenv et-micc-build
Collecting et-micc-build==0.10.13
  Using cached https://files.pythonhosted.org/packages/1f/41/a3c2ca300f735742f7183127afaf302e3c9875ff14dedf1cf14b1850774e/et_micc_build-0.10.13-py3-none-any.whl
...
Successfully installed MarkupSafe-1.1.1 Pygments-2.5.2 alabaster-0.7.12 arrow-0.15.4
babel-2.7.0 binaryornot-0.4.4 certifi-2019.11.28 chardet-3.0.4 click-7.0 cookiecutter-1.6.0
docutils-0.15.2 et-micc-0.10.13 et-micc-build-0.10.13 future-0.18.2 idna-2.8 imagesize-1.1.0
jinja2-2.10.3 jinja2-time-0.2.0 numpy-1.17.4 packaging-19.2 pbr-5.4.4 poyo-0.5.0 pybind11-2.4.3
pyparsing-2.4.5 python-dateutil-2.8.1 pytz-2019.3 requests-2.22.0 semantic-version-2.8.3
setuptools-42.0.2 six-1.13.0 snowballstemmer-2.0.0 sphinx-2.3.0 sphinx-click-2.3.1
sphinx-rtd-theme-0.4.3 sphinxcontrib-applehelp-1.0.1 sphinxcontrib-devhelp-1.0.1
sphinxcontrib-htmlhelp-1.0.2 sphinxcontrib-jsmath-1.0.1 sphinxcontrib-qthelp-1.0.2
sphinxcontrib-serializinghtml-1.1.3 tomlkit-0.5.8 urllib3-1.25.7 walkdir-0.4.1
whichcraft-0.6.1

Note, that Numpy 1.17.4 is installed too, which we wanted to avoid because it is not optimised for the cluster. Because we are not installing into the environment’s site-packages directory, pip does not cross-check if the packages are already available there and there is no flag to make it do that. Hence, we must manually remove numpy:

> rm -rf .cenv/numpy*\

We must also install pytest as it is not in the Intel Python distribution, nor is it a dependency of micc-build.

> pip install -t .cenv pytest

Now set the PYTHONPATH environment variable ot the .cenv directory and export it:

> export PYTHONPATH=$PWD/.cenv

Note

The PYTHONPATH environment variable is retained for the duration of the terminal session.

Run pytest to see if everything is working:

> python -m pytest
========================================================== test session starts ==========================================================
platform linux -- Python 3.6.9, pytest-5.3.2, py-1.8.0, pluggy-0.13.1
rootdir: /data/antwerpen/201/vsc20170/workspace/ET-dot
collected 8 items / 1 error / 7 selected

================================================================ ERRORS =================================================================
________________________________________________ ERROR collecting tests/test_cpp_dotc.py ________________________________________________
tests/test_cpp_dotc.py:10: in <module>
    cpp = et_dot.dotc
E   AttributeError: module 'et_dot' has no attribute 'dotc'
------------------------------------------------------------ Captured stdout ------------------------------------------------------------
[ERROR]
    Binary extension module 'bar{get_extension_suffix}' could not be build.
    Any attempt to use it will raise exceptions.

...
------------------------------------------------------------ Captured stderr ------------------------------------------------------------
[INFO] [ Building cpp module 'dotc':
[INFO]           Building using default build options.
[DEBUG]          [ > cmake -D PYTHON_EXECUTABLE=/apps/antwerpen/x86_64/centos7/intel-psxe/2019_update5/intelpython3/bin/python -D pybind11_DIR=/data/antwerpen/201/vsc20170/workspace/ET-dot/.cenv/et_micc_build/cmake_tools ..
[DEBUG]              (stdout)
                       -- The CXX compiler identification is GNU 4.8.5
                       -- Check for working CXX compiler: /usr/bin/c++
                       -- Check for working CXX compiler: /usr/bin/c++ -- works
                       -- Detecting CXX compiler ABI info
                       -- Detecting CXX compiler ABI info - done
                       -- Detecting CXX compile features
                       -- Detecting CXX compile features - done
                       -- Found PythonInterp: /apps/antwerpen/x86_64/centos7/intel-psxe/2019_update5/intelpython3/bin/python (found version "3.6.9")
                       -- Found PythonLibs: /apps/antwerpen/x86_64/centos7/intel-psxe/2019_update5/intelpython3/lib/libpython3.6m.so
                       -- Performing Test HAS_CPP14_FLAG
                       -- Performing Test HAS_CPP14_FLAG - Failed
                       -- Performing Test HAS_CPP11_FLAG
                       -- Performing Test HAS_CPP11_FLAG - Success
                       -- Performing Test HAS_FLTO
                       -- Performing Test HAS_FLTO - Success
                       -- LTO enabled
                       -- Configuring done
                       -- Generating done
                       -- Build files have been written to: /data/antwerpen/201/vsc20170/workspace/ET-dot/et_dot/cpp_dotc/_cmake_build
[DEBUG]          ] done.
[DEBUG]          [ > make
[WARNING]            > make
[WARNING]            (stdout)
                     Scanning dependencies of target dotc
                     [ 50%] Building CXX object CMakeFiles/dotc.dir/dotc.cpp.o
[WARNING]            (stderr)
                     /data/antwerpen/201/vsc20170/workspace/ET-dot/et_dot/cpp_dotc/dotc.cpp:8:31: fatal error: pybind11/pybind11.h: No such file or directory
                      #include <pybind11/pybind11.h>
                                                    ^
                     compilation terminated.
                     make[2]: *** [CMakeFiles/dotc.dir/dotc.cpp.o] Error 1
                     make[1]: *** [CMakeFiles/dotc.dir/all] Error 2
                     make: *** [all] Error 2
[DEBUG]          ] done.
[INFO] ] done.
[INFO] [ Building f2py module 'dotf':
[INFO]           Building using default build options.
_f2py_build/src.linux-x86_64-3.6/dotfmodule.c:144:12: warning: ‘f2py_size’ defined but not used [-Wunused-function]
 static int f2py_size(PyArrayObject* var, ...)
            ^
[DEBUG]          [ > ln -sf /data/antwerpen/201/vsc20170/workspace/ET-dot/et_dot/f2py_dotf/dotf.cpython-36m-x86_64-linux-gnu.so /data/antwerpen/201/vsc20170/workspace/ET-dot/et_dot/dotf.cpython-36m-x86_64-linux-gnu.so
[DEBUG]          ] done.
[INFO] ] done.
=========================================================== warnings summary ============================================================
/user/antwerpen/201/vsc20170/data/workspace/ET-dot/.cenv/past/builtins/misc.py:45
  /user/antwerpen/201/vsc20170/data/workspace/ET-dot/.cenv/past/builtins/misc.py:45: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
    from imp import reload

/user/antwerpen/201/vsc20170/data/workspace/ET-dot/.cenv/cookiecutter/repository.py:19
  /user/antwerpen/201/vsc20170/data/workspace/ET-dot/.cenv/cookiecutter/repository.py:19: DeprecationWarning: Flags not at the start of the expression '\n(?x)\n((((git|hg)\\+)' (truncated)
    """)

-- Docs: https://docs.pytest.org/en/latest/warnings.html
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
===================================================== 2 warnings, 1 error in 6.40s ======================================================

Inspecting the output shows us that we are half way: the f2py module dotf was built, but the cpp module dotc failed to build because the pybind11 include files could not be found. Although pybind11-2.4.3 appears in the output of pip install -t .cenv et-micc-build above, it only installs the python components (which we don’t need) and not the include files (which we do need). This is not to difficult to solve. First clone the pybind11 git repo somewhere in $VSC_DATA. We choose to do that in the parent directory of ET-dot:

> git clone https://github.com/pybind/pybind11.git
Cloning into 'pybind11'...
remote: Enumerating objects: 38, done.
remote: Counting objects: 100% (38/38), done.
remote: Compressing objects: 100% (30/30), done.
remote: Total 11291 (delta 14), reused 12 (delta 3), pack-reused 11253
Receiving objects: 100% (11291/11291), 4.22 MiB | 2.32 MiB/s, done.
Resolving deltas: 100% (7612/7612), done.

Next, we must tell our ET-dot project where it can find the pybind11 include files. Cd into the _cmake_build directory and edit the CMakeCache.txt file:

> cd ET-dot/et_dot/cpp_dotc/_cmake_build
> vim CMakeCache.txt                        # or whatever editor you like...
...

There should be a CMAKE_CXX_FLAGS:STRING entry which must be set to -I, followed by the exact path of the pybind11/include/ directory:

//Flags used by the CXX compiler during all build types.
CMAKE_CXX_FLAGS:STRING=-I/data/antwerpen/201/vsc20170/workspace/pybind11/include/

Finally, running pytest again, we see that all our problems are solved:

> python -m pytest
================================================ test session starts =================================================
platform linux -- Python 3.6.9, pytest-5.3.2, py-1.8.0, pluggy-0.13.1
rootdir: /data/antwerpen/201/vsc20170/workspace/ET-dot
collected 9 items

tests/test_cpp_dotc.py .                                                                                       [ 11%]
tests/test_et_dot.py .......                                                                                   [ 88%]
tests/test_f2py_dotf.py .                                                                                      [100%]

================================================= 9 passed in 0.25s ==================================================