Tutorials

Tutorial 1: Getting started with micc

Note

All tutorial sections start with the bare essentials, which should get you up and running. They are often followed by more detailed subsections that provide useful background information that is needed for intermediate or advanced usage. These sections have an explicit [intermediate] or [advanced] tag in the title, e.g. 1.1.1. Modules and packages [intermediate] and they are indented. Background sections can be skipped on first reading, but the user is encouraged to read them at some point. The tutorials are rather extensive as they interlaced with many good practices advises.

Micc wants to provide a practical interface to the many aspects of managing a Python project: setting up a new project in a standardized way, adding documentation, version control, publishing the code to PyPI, building binary extension modules in C++ or Fortran, dependency management, … For all these aspect there are tools available, yet i found myself struggling get everything right and looking up the details each time. Micc is an attempt to wrap all the details by providing the user with a standardized yet flexible workflow for managing a Python project. Standardizing is a great way to increase productivity. For many aspects the tools used by Micc are completely hidden from the user, e.g. project setup, adding components, building binary extensions, … For other aspects Micc provides just the necessary setup for you to use other tools as you need them. Learning to use the following tools is certainly beneficial:

  • Poetry: for dependency management, virtual environment creation, and publishing the project to PyPI (and a lot more, if you like). Although extremely handy on a desktop machine or a laptop, it does not play well with the module system that is used on the VSC clusters for accessing applications. A workaround is provided in Tutorial 6.
  • Git: for version control. Its use is optional but highly recommended. See Tutorial 4 for some git coverage.
  • Pytest: for (unit) testing. Also optional and also highly recommended.
  • Sphinx: for building documentation. Optional but recommended.

The basic commands for theese tools are covered in these tutorials.

1.1 Creating a project

Creating a new project is simple:

> micc create path/to/my_first_project

This creates a new project my_first_project in folder path/to. Note that the directory path/to/my_first_project must either not exist, or be empty.

Typically, the new project is created in the current working directory:

> cd path/to
> micc create my_first_project
[INFO]           [ Creating project (my_first_project):
[INFO]               Python module (my_first_project): structure = (my_first_project/my_first_project.py)
...
[INFO]           ] done.

After creating the project, we cd into the project directory because any further micc commands will then automatically act on the project in the current working directory:

> cd my_first_project

To apply a micc command to a project that is not in the current working directory see 1.2.1. The project path in micc.

The above command creates a project for a simple Python module, that is, the project directory will contain - among others - a file my_first_project.py in which represents the Python module:

my_first_project          # the project directory
└── my_first_project.py   # the Python module, this is where your code goes

When some client code imports this module:

import my_first_module

the code in my_first_module.py is executed.

Note that the name of the Python module name is (automatically) taken from the project name that with gave in the micc create command. If you want project and module names to differ from each other, check out the 1.1.2 What’s in a name [intermediate] section.

The module project type above is suited for problems that can be solved with a single Python file (my_first_project.py in the above case). For more complex problems a package structure is more appropriate. To learn more about the use of Python modules vs packages, check out the 1.1.1. Modules and packages [intermediate] section below.

1.1.1. Modules and packages [intermediate]

A Python module is the simplest Python project we can create. It is meant for rather small projects that conveniently fit in a single (Python) file. More complex projects require a package structure. They are created by adding the --package flag on the command line:

> micc create my_first_project --package
[INFO]           [ Creating project (my_first_project):
[INFO]               Python package (my_first_project): structure = (my_first_project/my_first_project/__init__.py)
[INFO]               [ Creating git repository
                       ...
[INFO]               ] done.
[WARNING]            Run 'poetry install' in the project directory to create a virtual environment and install its dependencies.
[INFO]           ] done.

The output shows a different file structure of the project than for a module. Instead of the file my_first_project.py there is a directory my_first_project, containing a __init__.py file. So, the structure of a package project looks like this:

my_first_project          # the project directory
└── my_first_project      # the package directory
    └── __init__.py       # the file where your code goes

Typically, the package directory will contain several other Python files that together make up your Python package. When some client code imports a module with a package structure,

import my_first_module

it is the code in my_first_module/__init__.py that is executed. The my_first_module/__init__.py file is the equivalent of the my_first_module.py in a module structure.

The distinction between a module structure and a package structure is also important when you publish the module. When installing a Python package with a module structure, only the ‘’my_first_project.py’’ will be installed, while with the package structure the entire my_first_project directory will be installed.

If you created a projected with a module structure and discover over time that its complexity has grown beyond the limits of a simple module, you can easily convert it to a package structure project at any time. First cd into the project directory and run:

> cd my_first_project
> micc convert-to-package
[INFO]           Converting Python module project my_first_project to Python package project.
[WARNING]        Pre-existing files that would be overwritten:
[WARNING]          /Users/etijskens/software/dev/workspace/p1/docs/index.rst
Aborting because 'overwrite==False'.
  Rerun the command with the '--backup' flag to first backup these files (*.bak).
  Rerun the command with the '--overwrite' flag to overwrite these files without backup.

Because we do not want to replace existing files inadvertently, this command will always fail, unless you add either the --backup flag, in which case micc makes a backup of all files it wants to replace, or the --overwrite flag, in which case those files will be overwritten. Micc will always produce a list of files it wants to replace. Unless you deliberately modified one of the files in the list, you can safely use --overwrite. If you did, use the --backup flag and manually copy the the changes from the .bak file to the new file.

> micc convert-to-package --overwrite
Converting simple Python project my_first_project to general Python project.
[WARNING]        '--overwrite' specified: pre-existing files will be overwritten WITHOUT backup:
[WARNING]        overwriting /Users/etijskens/software/dev/workspace/ET-dot/docs/index.rst

If you want micc to create a project with a package structure, rather than the default module structure you must append the --package flag (or -p) to to the micc create command:

> micc create my_first_project --package

[INFO]           [ Creating project (my_first_project):
[INFO]               Python package (my_first_project): structure = (my_first_project/my_first_project/__init__.py)
...
[INFO]           ] done.

The output of the command clearly shows the package structure.

1.1.2 What’s in a name [intermediate]

The name you choose for your project has many consequences. Ideally, a project name is

  • descriptive
  • unique
  • short

Although one might think of even more requirements, such as being easy to type, satisfying these three is already hard enough. E.g. my_nifty_module may possibly be unique, but it is neither descriptive, neither short. On the other hand, dot_product is descriptive, reasonably short, but probably not unique. Even my_dot_product is probably not unique, and, in addition, confusing to any user that might want to adopt your my_dot_product. A unique name - or at least a name that has not been taken before - becomes really important when you want to publish your code for others to use it. The standard place to publish Python code is the Python Package Index, where you find hundreds of thousands of projects, many of which are really interesting and of high quality. Even if there are only a few colleagues that you want to share your code with, you make their life (as well as yours) easier when you publish your my_nifty_module at PyPI. To install your my_nifty_module they will only need to type:

> pip install my_nifty_module

(The name my_nifty_module is not used so far, but nevertheless we recommend to choose a better name). Micc will help you publishing your work at PyPI with as little effort as possible, provided your name has not been used sofar. Note that the micc create command has a --publish flag that checks if the name you want to use for your project is still available on PyPI, and, if not, refuses to create the project and asks you to use another name for your project.

As there are indeed hundreds of thousands of Python packages published on PyPI, finding a good name has become quite hard. Personally, I often use a simple and short descriptive name, prefixed by my initials, et-, which generally makes the name unique. It has the advantage that all my published modules are grouped in the PyPI listing.

Another point of attention is that although in principle project names can be anything supported by your OS file system, as they are just the name of a directory, micc insists that module and package names comply with the PEP8 module naming rules. Micc derives the package (or module) name from the project name as follows:

  • capitals are replaced by lower-case
  • hyphens``’-‘`` are replaced by underscores '_'

If the resulting module name is not PEP8 compliant, you get an informative error message:

> micc create 1proj
[ERROR]
The project name (1proj) does not yield a PEP8 compliant module name:"
  The project name must start with char, and contain only chars, digits, hyphens and underscores."
  Alternatively, provide an explicit module name with the --module-name=<name>"

The last line indicates that you can specify an explicit module name, unrelated to the project name. In that case PEP8 compliance is not checked. The responsability then is all yours.

1.2 First steps in micc

1.2.1. The project path in micc

All micc commands accept the global --project-path=<path> parameter. Global parameters appear before the subcommand name. E.g. the command:

> micc --project-path path/to/my_first_project info
Project my_first_project located at path/to/my_first_project.
  package: my_first_project
  version: 0.0.0
  structure: my_first_project.py (Python module)

prints some info on the project at path/to/my_first_project. This can conveniently be abbreviated as:

> micc -p path/to/my_first_project info

Even the create command accepts the global --project-path=<path> parameter:

> micc -p path/to/my_second_project create

will create project my_second_project in the specified location. The command is identical to:

> micc create path/to/my_second_project

The default value for the project path is the current working directory, so:

> micc info

will print info about the project in the current working directory.

Hence, while working on a project, it is convenient to cd into the project directory and execute your micc commands from there, without the the global --project-path=<path> parameter.

This approach works even with the micc create command. If you create an empty directory and cd into it, you can just run micc create and it will create the project in the current working directory, taking the project name from the name of the current working directory.

1.2.2 Virtual environments

Virtual environments enable you to quickly set up a Python environment that isolated from the installed Python on your system. In this way you can easily cope with different dependencies between your Python projects.

For a detailed introduction to virtual environments see Python Virtual Environments: A Primer.

When you are developing or using several Python projects it can become difficult for a single Python environment to satisfy all the dependency requirements of these projects simultaneously. Dependencies conflict can easily arise. Python promotes and facilitates code reuse and as a consequence Python tools typically depend on tens to hundreds of other modules. If toolA and toolB both need moduleC, but each requires a different version of it, there is a conflict because it is impossible to install two versions of the same module in a Python environment. The solution that the Python community has come up with for this problem is the construction of virtual environments, which isolates the dependencies of a single project to a single environment.

Creating virtual environments

Since Python 3.3 Python comes with a venv module for the creation of virtual environments:

> python -m venv my_virtual_environment

This creates a directory my_virtual_environment in the current working directory which is a complete isolated Python environment. The Python version in this virtual environment is the same as that of the python command with which the virtual environment was created. To use this virtual environment you must activate it:

> source my_virtual_environment/bin/activate
(my_virtual_environment) >

Activating a virtual environment modifies the command prompt to remind you constantly that you are working in a virtual environment. The virtual environment is based on the current Python - by preference set by pyenv. If you install new packages, they will be installed in the virtual environment only. The virtual environment can be deactivated by running

(my_virtual_environment) > deactivate
>
Creating virtual environments with Poetry

Poetry uses the above mechanism to manage virtual environment on a per project basis, and can install all the dependencies of that project, as specified in the pyproject.toml file, using the install command. Since our project does not have a virtual environment yet, Poetry creates one, named .venv, and installs all dependencies in it. We first choose the Python version to use for the project:

> pyenv local 3.7.5
> python --version
Python 3.7.5
> which python
/Users/etijskens/.pyenv/shims/python

Next, create use poetry to create the virtual environment and install all its dependencies specified in pyproject.toml:

> poetry install
Creating virtualenv et-dot in /Users/etijskens/software/dev/my_first_project/.venv
Updating dependencies
Resolving dependencies... (0.8s)

Writing lock file


Package operations: 10 installs, 0 updates, 0 removals

  - Installing pyparsing (2.4.5)
  - Installing six (1.13.0)
  - Installing atomicwrites (1.3.0)
  - Installing attrs (19.3.0)
  - Installing more-itertools (7.2.0)
  - Installing packaging (19.2)
  - Installing pluggy (0.13.1)
  - Installing py (1.8.0)
  - Installing wcwidth (0.1.7)
  - Installing pytest (4.6.6)
  - Installing ET-dot (0.0.0)

The installed packages are all dependencies of pytest which we require for testing our code. The last package is ET-dot itself, which is installed in so-called development mode. This means that any changes in the source code are immediately visible in the virtual environment. Adding/removing dependencies is easily achieved by running poetry add some_module and poetry remove some_other_module. Consult the poetry_documentation for details

If the virtual environment already exists, or if some virtual environment is activated (not necessarily that of the project itself - be warned), that virtual environment is reused and all installations pertain to that virtual environment.

To use the just created virtual environment of our project, we must activate it:

> source .venv/bin/activate
(.venv)> python --version
Python 3.7.5
(.venv) > which python
/Users/etijskens/software/dev/ET-dot/.venv/bin/python

The location of the virtual environment’s Python and its version are as expected.

Note

Whenever you see a command prompt like (.venv) > the local virtual environment of the project has been activated. If you want to try yourself, you must activate it too.

To deactivate a script just run deactivate:

(.venv) > deactivate
> which python
/Users/etijskens/.pyenv/shims/python

The (.venv) notice disappears, and the active python is no longer that in the virtual environment, but the Python specified by pyenv

If something is wrong with a virtual environment, you can simply delete it:

> rm -rf .venv

and create it again. Sometimes it is necessary to delete the poetry.lock as well:

> rm poetry.lock

1.2.3 Modules and scripts

Micc always creates fully functional examples, complete with test code and documentation, so that you can inspect the files and see as much as possible how things are supposed to work. The my_first_project/my_first_project.py module contains a simple hello world method, called hello:

# -*- coding: utf-8 -*-
"""
Package my_first_project
========================

A 'hello world' example.
"""
__version__ = "0.0.0"


def hello(who='world'):
    """'Hello world' method."""
    result = "Hello " + who
    return result

The module can be used right away. Open an interactive Python session and enter the following commands:

> cd path/to/my_first_project
> source .venv/bin/activate
(.venv) > python
Python 3.8.0 (default, Nov 25 2019, 20:09:24)
[Clang 11.0.0 (clang-1100.0.33.12)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import my_first_project
>>> my_first_project.hello()
'Hello world'
>>> my_first_project.hello("student")
'Hello student'
>>>

Productivity tip

Using an interactive python session to verify that a module does indeed what you expect is a bit cumbersome. A quicker way is to modify the module so that it can also behave as a script. Add the following lines to my_first_project/my_first_project.py at the end of the file:

if __name__=="__main__":
   print(hello())
   print(hello("student"))

and execute it on the command line:

(.venv) > python my_first_project.py
Hello world
Hello student

The body of the if statement is only executed if the file is executed as a script. When the file is imported, it is ignored.

While working on a single-file project it is sometimes handy to put your tests the body of if __name__=="__main__":, as below:

if __name__=="__main__":
   assert hello() == "Hello world"
   assert hello("student") == "Hello student"
   print("-*# success #*-")

The last line makes sure that you get a message that all tests went well if they did, otherwise an AssertionError will be raised. When you now execute the script, you should see:

(.venv) > python my_first_project.py
-*# success #*-

When you develop your code in an IDE like eclipse+pydev or PyCharm, you can even execute the file without having to leave your editor and switch to a terminal. You can quickly code, test and debug in a single window.

While this is a very productive way of developing, it is a bit on the quick and dirty side. If the module code and the tests become more involved, however,the file will soon become cluttered with test code and a more scalable way to organise your tests is needed. Micc has already taken care of this.

1.2.4 Testing your code

Test driven development is a software development process that relies on the repetition of a very short development cycle: requirements are turned into very specific test cases, then the code is improved so that the tests pass. This is opposed to software development that allows code to be added that is not proven to meet requirements. The advantage of this is clear: the shorter the cycle, the smaller the code that is to be searched for bugs. This allows you to produce correct code faster, and in case you are a beginner, also speeds your learning of Python. Please check Ned Batchelder’s very good introduction to testing with pytest.

When micc creates a new project, or when you add components to an existing project, it immediately adds a test script for each component in the tests directory. The test script for the my_first_project module is in file ET-dot/tests/test_my_first_project.py. Let’s take a look at the relevant section:

# -*- coding: utf-8 -*-
"""Tests for my_first_project package."""

import my_first_project

def test_hello_noargs():
    """Test for my_first_project.hello()."""
    s = my_first_project.hello()
    assert s=="Hello world"

def test_hello_me():
    """Test for my_first_project.hello('me')."""
    s = my_first_project.hello('me')
    assert s=="Hello me"

Tests like this are very useful to ensure that during development the changes to your code do not break things. There are many Python tools for unit testing and test driven development. Here, we use Pytest:

> pytest
=============================== test session starts ===============================
platform darwin -- Python 3.7.4, pytest-4.6.5, py-1.8.0, pluggy-0.13.0
rootdir: /Users/etijskens/software/dev/workspace/foo
collected 2 items

tests/test_foo.py ..                                                        [100%]

============================ 2 passed in 0.05 seconds =============================

The output shows some info about the environment in which we are running the tests, the current working directory (c.q. the project directory, and the number of tests it collected (2). Pytest looks for test methods in all test_*.py or *_test.py files in the current directory and accepts test prefixed methods outside classes and test prefixed methods inside Test prefixed classes as test methods to be executed.

Note

Sometimes pytest discovers unintended test files or functions in other directories than the tests directory, leading to puzzling errors. It is therefore safe to instruct pytest to look only in the tests directory:

> pytest tests
...

If a test would fail you get a detailed report to help you find the cause of the error and fix it.

Debugging test code

When the report provided by pytest does not yield a clue on the cause of the failing test, you must use debugging and execute the failing test step by step to find out what is going wrong where. From the viewpoint of pytest, the files in the tests directory are modules. Pytest imports them and collects the test methods, and executes them. Micc also makes every test module executable using the technique described in 1.2.3 Modules and scripts. At the end of every test file you will find some extra code:

if __name__ == "__main__":
    the_test_you_want_to_debug = test_hello_noargs

    print("__main__ running", the_test_you_want_to_debug)
    the_test_you_want_to_debug()
    print('-*# finished #*-')

On the first line of the if __name__ == "__main__": body, the variable the_test_you_want_to_debug is set to the name of some test method in our test file test_et_dot.py, here test_hello_noargs, which refers to the hello world that was in the et_dot.py file originally. The variable the_test_you_want_to_debug is now just another variable pointing to the very same function object as test_hello_noargs and behaves exactly the same (see Functions are first class objects). The next statement prints a start message that tells you that __main__ is running that test method, after which the test method is called through the the_test_you_want_to_debug variable, and finally another message is printed to let you know that the script finished. Here is the output you get when running this test file as a script:

(.venv) > python tests/test_et_dot.py
__main__ running <function test_hello_noargs at 0x1037337a0>
-*# finished #*-

The execution of the test does not produce any output. Now you can use your favourite Python debugger to execute this script and step into the test_hello_noargs test method and from there into et_dot.hello to examine if everything goes as expected. Thus, to debug a failing test, you assign its name to the the_test_you_want_to_debug variable and debug the script.

1.2.5 Generating documentation [intermediate]

Documentation is extracted from the source code using Sphinx. It is almost completely generated automatically from the doc-strings in your code. Doc-strings are the text between triple double quote pairs in the examples above, e.g. """This is a doc-string.""". Important doc-strings are:

  • module doc-strings: at the beginning of the module. Provides an overview of what the module is for.
  • class doc-strings: right after the class statement: explains what the class is for. (Usually, the doc-string of the __init__ method is put here as well, as dunder methods (starting and ending with a double underscore) are not automatically considered by sphinx.
  • method doc-strings: right after a def statement.

According to pep-0287 the recommended format for Python doc-strings is restructuredText. E.g. a typical method doc-string looks like this:

def hello_world(who='world'):
    """Short (one line) description of the hello_world method.

    A detailed and longer description of the hello_world method.
    blablabla...

    :param str who: an explanation of the who parameter. You should
        mention its default value.
    :returns: a description of what hello_world returns (if relevant).
    :raises: which exceptions are raised under what conditions.
    """

Here, you can find some more examples.

Thus, if you take good care writing doc-strings, helpfule documentation follows automatically.

Micc sets up al the necessary components for documentation generation in sub-directory et-dot/docs/. There, you find a Makefile that provides a simple interface to Sphinx. Here is the workflow that is necessary to build the documentation:

> cd path/to/et-dot
> source .venv/bin/activate
(.venv) > cd docs
(.venv) > make html

The last line produces documentation in html format.

Let’s explain the steps

  1. cd into the project directory:

    > cd path/to/et-dot
    >
    
  2. Activate the project’s virtual environment:

    > source .venv/bin/activate
    (.venv) >
    
  3. cd into the docs subdirectory:

    (.venv) > cd docs
    (.venv) >
    

    Here, you will find the Makefile that does the work:

    (.venv) > ls -l
    total 80
    -rw-r--r--  1 etijskens  staff  1871 Dec 10 11:24 Makefile
    ...
    

To see a list of possible documentation formats, just run make without arguments:

(.venv) > make
Sphinx v2.2.2
Please use `make target' where target is one of
  html        to make standalone HTML files
  dirhtml     to make HTML files named index.html in directories
  singlehtml  to make a single large HTML file
  pickle      to make pickle files
  json        to make JSON files
  htmlhelp    to make HTML files and an HTML help project
  qthelp      to make HTML files and a qthelp project
  devhelp     to make HTML files and a Devhelp project
  epub        to make an epub
  latex       to make LaTeX files, you can set PAPER=a4 or PAPER=letter
  latexpdf    to make LaTeX and PDF files (default pdflatex)
  latexpdfja  to make LaTeX files and run them through platex/dvipdfmx
  text        to make text files
  man         to make manual pages
  texinfo     to make Texinfo files
  info        to make Texinfo files and run them through makeinfo
  gettext     to make PO message catalogs
  changes     to make an overview of all changed/added/deprecated items
  xml         to make Docutils-native XML files
  pseudoxml   to make pseudoxml-XML files for display purposes
  linkcheck   to check all external links for integrity
  doctest     to run all doctests embedded in the documentation (if enabled)
  coverage    to run coverage check of the documentation (if enabled)
(.venv) >
  1. To build documentation in html format, enter:

    (.venv) > make html
    ...
    (.venv) >
    

    This will generation documentation in et-dot/docs/_build/html. Note that it is essential that this command executes in the project’s virtual environment. You can view the documentation in your favorite browser:

    (.venv) > open _build/html/index.html       # on macosx
    

    or:

    (.venv) > xdg-open _build/html/index.html   # on ubuntu
    

    (On the cluster the command will fail because it does not have a graphical environment and it cannot run a html-browser.)

    Here is a screenshot:

    _images/im1-1.png

    If your expand the API tab on the left, you get to see the my_first_project module documentation, as it generated from the doc-strings:

    _images/im1-2.png
  2. To build documentation in .pdf format, enter:

    (.venv) > make latexpdf
    

    This will generation documentation in :file:et-dot/docs/_build/latex/et-dot.pdf`. Note that it is essential that this command executes in the project’s virtual environment. You can view it in your favorite pdf viewer:

    (.venv) > open _build/latex/et-dot.pdf      # on macosx
    

or:

(.venv) > xdg-open _build/latex/et-dot.pdf      # on ubuntu

Note

When building documentation by running the docs/Makefile, it is verified that the correct virtual environment is activated, and that the needed Python modules are installed in that environment. If not, they are first installed using pip install. These components are not becoming dependencies of the project. If needed you can add dependencies using the poetry add command.

The boilerplate code for documentation generation is in the docs directory, just as if it were generated by hand using sphinx-quickstart. (In fact, it was generated using sphinx-quickstart, but then turned into a Cookiecutter template.) those files is not recommended, and only rarely needed. Then there are a number of .rst files with capitalized names in the project directory:

  • README.rst is assumed to contain an overview of the project,
  • API.rst describes the classes and methods of the project in detail,
  • APPS.rst describes command line interfaces or apps added to your project.
  • AUTHORS.rst list the contributors to the project
  • HISTORY.rst which should describe the changes that were made to the code.

The .rst extenstion stands for reStructuredText. It iss a simple and concise approach to text formatting.

If you add components to your project through micc, care is taken that the .rst files in the project directory and the docs directory are modified as necessary, so that sphinx is able find the doc-strings. Even for command line interfaces (CLI, or console scripts) based on click the documentation is generated neatly from the help strings of options and the doc-strings of the commands.

1.2.6 Version control [advanced]

Although version control is extremely important for any software project with a lifetime of more a day, we mark it as an advanced topic as it does not affect the development itself. Micc facilitates version control by automatically creating a local git repository in your project directory. If you do not want to use it, you may ignore it or even delete it.

Git is a version control system that solves many practical problems related to the process software development, independent of whether your are the only developer, or there is an entire team working on it from different places in the world. You find more information about how micc uses git in Tutorial 4.

Let’s take a close look at the output of the micc create my_first_project command. The first line tells us that a project directory is being created:

[INFO]           [ Creating project (my_first_project):

The next line explains the structure of the project, module or package:

[INFO]               Python module (my_first_project): structure = (my_first_project/my_first_project.py)

Next we are informed that a local git repository is being created:

[INFO]               [ Creating git repository

Micc tries to push this local repository to a remote repository at https://github.com/yourgitaccount. If you did not create a remote git repository on beforehand, this gives rise to some warnings:

[WARNING]                    > git push -u origin master
[WARNING]                    (stderr)
                             remote: Repository not found.
                             fatal: repository 'https://github.com/yourgitaccount/my_first_project/' not found

Micc is unable to push the local repo to github, if the remote repo does not exist. The local repo is for many purposes sufficient, but the remote repo enables sharing your work with others and provides a backup of your work.

Finally, micc informs us that the tasks are finished.

[INFO] ] done. [INFO] ] done. >

Note that the name of the remote git repo is the project name, not the module name.

1.3 Miscellaneous

1.3.1 The license file [intermediate]

The project directory contains a LICENCE file, a text file describing the licence applicable to your project. You can choose between

  • MIT license (default),
  • BSD license,
  • ISC license,
  • Apache Software License 2.0,
  • GNU General Public License v3 and
  • Not open source.

MIT license is a very liberal license and the default option. If you’re unsure which license to choose, you can use resources such as GitHub’s Choose a License

You can select the license file when you create the project:

> cd some_empty_dir
> micc create --license BSD

Of course, the project depends in no way on the license file, so it can be replaced manually at any time by the license you desire.

1.3.2 The Pyproject.toml file [intermediate]

The file pyproject.toml (located in the project directory) is the modern way to describe the build system requirements of the project: PEP 518. Although most of this file’s content is generated automatically by micc and poetry some understanding of it is useful, consult https://poetry.eustace.io/docs/pyproject/.

The pyproject.toml file is rather human-readable:

> cat pyproject.toml
[tool.poetry]
name = "ET-dot"
version = "1.0.0"
description = "<Enter a one-sentence description of this project here.>"
authors = ["Engelbert Tijskens <engelbert.tijskens@uantwerpen.be>"]
license = "MIT"

readme = 'README.rst'

repository = "https://github.com/etijskens/ET-dot"
homepage = "https://github.com/etijskens/ET-dot"

keywords = ['packaging', 'poetry']

[tool.poetry.dependencies]
python = "^3.7"
et-micc-build = "^0.10.10"

[tool.poetry.dev-dependencies]
pytest = "^4.4.2"

[tool.poetry.scripts]

[build-system]
requires = ["poetry>=0.12"]
build-backend = "poetry.masonry.api"

1.3.3 The log file Micc.log [intermediate]

The project directory also contains a log file micc.log. All micc commands that modify the state of the project leave a trace in this file, So you can look up what happened when to your project. Should you think that the log file has become too big, or just useless, you can delete it manually, or add the --clear-log flag before any micc subcommand, to remove it. If the subcommand alters the state of the project, the log file will only contain the log messages from the last subcommand.

> ll micc.log
-rw-r--r--  1 etijskens  staff  34 Oct 10 20:37 micc.log

> micc --clear-log info
Project bar located at /Users/etijskens/software/dev/workspace/bar
  package: bar
  version: 0.0.0
  structure: bar.py (Python module)

> ll micc.log
ls: micc.log: No such file or directory

1.3.4 Adjusting micc to your needs [advanced]

Micc is based on a series of additive Cookiecutter templates which generate the boilerplate code. If you like, you can tweak these templates in the site-packages/et_micc/templates directory of your micc installation. When you pipx installed micc, that is typically something like:

~/.local/pipx/venvs/et-micc/lib/pythonX.Y/site-packages/et_micc,

where :file`pythonX.Y` is the python version you installed micc with.

1.4 A first real project

Let’s start with a simple problem: a Python module that computes the scalar product of two arrays, generally referred to as the dot product. Admittedly, this not a very rewarding goal, as there are already many Python packages, e.g. Numpy, that solve this problem in an elegant and efficient way. However, because the dot product is such a simple concept in linear algebra, it allows us to illustrate the usefulness of Python as a language for High Performance Computing, as well as the capabilities of Micc.

First, set up a new project for this dot project, which i named ET-dot, ET being my initials. Not knowing beforehand how involved this project will become, we create a simple module project:

> micc -p ET-dot create
[INFO]           [ Creating project (ET-dot):
[INFO]               Python module (my_first_project): structure = (ET-dot/et_dot.py
[INFO]               [ Creating git repository
[WARNING]                    > git push -u origin master
[WARNING]                    (stderr)
                             remote: Repository not found.
                             fatal: repository 'https://github.com/etijskens/ET-dot/' not found
[INFO]               ] done.
[WARNING]            Run 'poetry install' in the project directory to create a virtual environment and install its dependencies.
[INFO]           ] done.
> cd ET-dot

As the output shows the module name is converted from the project name and made compliant with the PEP8 module naming rules: et_dot. Next, we create a virtual environment for the project with all the standard micc dependencies:

> poetry install
Creating virtualenv et-dot in /Users/etijskens/software/dev/workspace/tmp/ET-dot/.venv
Updating dependencies
Resolving dependencies... (0.8s)

Writing lock file


Package operations: 10 installs, 0 updates, 0 removals

  - Installing pyparsing (2.4.5)
  - Installing six (1.13.0)
  - Installing atomicwrites (1.3.0)
  - Installing attrs (19.3.0)
  - Installing more-itertools (8.0.2)
  - Installing packaging (19.2)
  - Installing pluggy (0.13.1)
  - Installing py (1.8.0)
  - Installing wcwidth (0.1.7)
  - Installing pytest (4.6.7)
  - Installing ET-dot (0.0.0)
>

Next, activate the virtual environment:

> source .venv/bin/activate (.venv) >

Open module file et_dot.py in your favourite editor and code a dot product method (naievely) as follows:

# -*- coding: utf-8 -*-
"""
Package et_dot
==============
Python module for computing the dot product of two arrays.
"""
__version__ = "0.0.0"

def dot(a,b):
    """Compute the dot product of *a* and *b*.

    :param a: a 1D array.
    :param b: a 1D array of the same length as *a*.
    :returns: the dot product of *a* and *b*.
    :raises: ArithmeticError if ``len(a)!=len(b)``.
    """
    n = len(a)
    if len(b)!=n:
        raise ArithmeticError("dot(a,b) requires len(a)==len(b).")
    d = 0
    for i in range(n):
        d += a[i]*b[i]
    return d

We defined a dot() method with an informative doc-string that describes the parameters, the return value and the kind of exceptions it may raise.

We could use the dot method in a script as follows:

from et_dot import dot

a = [1,2,3]
b = [4.1,4.2,4.3]
a_dot_b = dot(a,b)

Note

This dot product implementation is naive for many reasons:

  • Python is very slow at executing loops, as compared to Fortran or C++.
  • The objects we are passing in are plain Python list`s. A :py:obj:`list is a very powerfull data structure, with array-like properties, but it is not exactly an array. A list is in fact an array of pointers to Python objects, and therefor list elements can reference anything, not just a numeric value as we would expect from an array. With elements being pointers, looping over the array elements implies non-contiguous memory access, another source of inefficiency.
  • The dot product is a subject of Linear Algebra. Many excellent libraries have been designed for this purpose. Numpy should be your starting point because it is well integrated with many other Python packages. There is also Eigen a C++ library for linear algebra that is neatly exposed to Python by pybind11.

In order to verify that our implementation of the dot product is correct, we write a test. For this we open the file tests/test_et_dot.py. Remove the original tests, and add a new one:

import et_dot

def test_dot_aa():
    a = [1,2,3]
    expected = 14
    result = et_dot.dot(a,a)
    assert result==expected

Save the file, and run the test. Pytest will show a line for every test source file. On each such line a . will appear for every successfull test, and a F for a failing test.

(.venv) > pytest
=============================== test session starts ===============================
platform darwin -- Python 3.7.4, pytest-4.6.5, py-1.8.0, pluggy-0.13.0
rootdir: /Users/etijskens/software/dev/workspace/ET-dot
collected 1 item

tests/test_et_dot.py .                                                      [100%]

============================ 1 passed in 0.08 seconds =============================
(.venv) >

Note

If the project’s virtual environment is not activated, the command pytest will generally not be found.

Great! our test succeeded. Let’s increment the project’s version (-p is short for --patch, and requests incrementing the patch component of the version string):

(.venv) > micc version -p
[INFO]           (ET-dot)> micc version (0.0.0) -> (0.0.1)

Obviously, our test tests only one particular case. A clever way of testing is to focus on properties. From mathematics we now that the dot product is commutative. Let’s add a test for that.

import random

def test_dot_commutative():
    # create two arrays of length 10 with random float numbers:
    a = []
    b = []
    for _ in range(10):
        a.append(random.random())
        b.append(random.random())
    # do the test
    ab = et_dot.dot(a,b)
    ba = et_dot.dot(b,a)
    assert ab==ba

You can easily verify that this test works too. We increment the version string again:

(.venv) > micc version -p
[INFO]           (ET-dot)> micc version (0.0.1) -> (0.0.2)

There is however a risk in using arrays of random numbers. Maybe we were just lucky and got random numbers that satisfy the test by accident. Also the test is not reproducible anymore. The next time we run pytest we will get other random numbers, and may be the test will fail. That would represent a serious problem: since we cannot reproduce the failing test, we have no way finding out what went wrong. For random numbers we can fix the seed at the beginning of the test. Random number generators are deterministic, so fixing the seed makes the code reproducible. To increase coverage we put a loop around the test.

def test_dot_commutative_2():
    # Fix the seed for the random number generator of module random.
    random.seed(0)
    # choose array size
    n = 10
    # create two arrays of length n with with zeros:
    a = n * [0]
    b = n * [0]
    # repetion loop:
    for r in range(1000):
        # fill a and b with random float numbers:
        for i in range(n):
            a[i] = random.random()
            b[i] = random.random()
        # do the test
        ab = et_dot.dot(a,b)
        ba = et_dot.dot(b,a)
        assert ab==ba

Again the test works. Another property of the dot product is that the dot product with a zero vector is zero.

def test_dot_zero():
    # Fix the seed for the random number generator of module random.
    random.seed(0)
    # choose array size
    n = 10
    # create two arrays of length n with with zeros:
    a = n * [0]
    zero = n * [0]
    # repetion loop (the underscore is a placeholder for a variable dat we do not use):
    for _ in range(1000):
        # fill a with random float numbers:
        for i in range(n):
            a[i] = random.random()
        # do the test
        azero = et_dot.dot(a,zero)
        assert azero==0

This test works too. Furthermore, the dot product with a vector of ones is the sum of the elements of the other vector:

def test_dot_one():
    # Fix the seed for the random number generator of module random.
    random.seed(0)
    # choose array size
    n = 10
    # create two arrays of length n with with zeros:
    a = n * [0]
    one = n * [1.0]
    # repetion loop (the underscore is a placeholder for a variable dat we do not use):
    for _ in range(1000):
        # fill a with random float numbers:
        for i in range(n):
            a[i] = random.random()
        # do the test
        aone = et_dot.dot(a,one)
        expected = sum(a)
        assert aone==expected

Success again. We are getting quite confident in the correctness of our implementation. Here is another test:

def test_dot_one_2():
    a1 = 1.0e16
    a   = [a1 ,1.0,-a1]
    one = [1.0,1.0,1.0]
    expected = 1.0
    result = et_dot.dot(a,one)
    assert result==expected

Clearly, it is a special case of the test above the expected result is the sum of the elements in a, that is 1.0. Yet it - unexpectedly - fails. Fortunately pytest produces a readable report about the failure:

> pytest
================================= test session starts ==================================
platform darwin -- Python 3.7.4, pytest-4.6.5, py-1.8.0, pluggy-0.13.0
rootdir: /Users/etijskens/software/dev/workspace/ET-dot
collected 6 items

tests/test_et_dot.py .....F                                                      [100%]

======================================= FAILURES =======================================
____________________________________ test_dot_one_2 ____________________________________

    def test_dot_one_2():
        a1 = 1.0e16
        a   = [a1 , 1.0, -a1]
        one = [1.0, 1.0, 1.0]
        expected = 1.0
        result = et_dot.dot(a,one)
>       assert result==expected
E       assert 0.0 == 1.0

tests/test_et_dot.py:91: AssertionError
========================== 1 failed, 5 passed in 0.17 seconds ==========================
>

Mathematically, our expectations about the outcome of the test are certainly correct. Yet, pytest tells us it found that the result is 0.0 rather than 1.0. What could possibly be wrong? Well our mathematical expectations are based on our - false - assumption that the elements of a are real numbers, most of which in decimal representation are characterised by an infinite number of digits. Computer memory being finite, however, Python (and for that matter all other programming languages) uses a finite number of bits to approximate real numbers. These numbers are called floating point numbers and their arithmetic is called floating point arithmetic. Floating point arithmetic has quite different properties than real number arithmetic. A floating point number in Python uses 64 bits which yields approximately 15 representable digits. Observe the consequences of this in the Python statements below:

>>> 1.0 + 1e16
1e+16
>>> 1e16 + 1.0 == 1e16
True
>>> 1.0 + 1e16 == 1e16
True
>>> 1e16 + 1.0 - 1e16
0.0

There are several lessons to be learned from this:

  • The test does not fail because our code is wrong, but because our mind is used to reasoning about real number arithmetic, rather than floating point arithmetic rules. As the latter is subject to round-off errors, tests sometimes fail unexpectedly. Note that for comparing floating point numbers the the standard library provides a math.isclose() method.
  • Another silent assumption by which we can be mislead is in the random numbers. In fact, random.random() generates pseudo-random numbers in the interval ``[0,1[``, which is quite a bit smaller than ]-inf,+inf[. No matter how often we run the test the special case above that fails will never be encountered, which may lead to unwarranted confidence in the code.

So, how do we cope with the failing test? Here is a way using math.isclose():

import math

def test_dot_one_2():
    a1 = 1.0e16
    a   = [a1 , 1.0, -a1]
    one = [1.0, 1.0, 1.0]
    expected = 1.0
    result = et_dot.dot(a,one)
    # assert result==expected
    assert math.isclose(result, expected, abs_tol=10.0)

This is a reasonable solution if we accept that when dealing with numbers as big as 1e19, an absolute difference of 10 is negligible.

Another aspect that should be tested is the behavior of the code in exceptional circumstances. Does it indeed raise ArithmeticError if the arguments are not of the same length? Here is a test:

import pytest

def test_dot_unequal_length():
    a = [1,2]
    b = [1,2,3]
    with pytest.raises(ArithmeticError):
        et_dot.dot(a,b)

Here, pytest.raises() is a context manager that will verify that ArithmeticError is raise when its body is executed.

Note

A detailed explanation about context managers see https://jeffknupp.com/blog/2016/03/07/python-with-context-managers//

Note that you can easily make et_dot.dot() raise other exceptions, e.g. TypeError by passing in arrays of non-numeric types:

>>> et_dot.dot([1,2],[1,'two'])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/etijskens/software/dev/workspace/ET-dot/et_dot.py", line 23, in dot
    d += a[i]*b[i]
TypeError: unsupported operand type(s) for +=: 'int' and 'str'
>>>

Note that it is not the product a[i]*b[i] for i=1 that is wreaking havoc, but the addition of its result to d.

At this point you might notice that even for a very simple and well defined function as the dot product the amount of test code easily exceeds the amount of tested code by a factor of 5 or more. This is not at all uncommon. As the tested code here is an isolated piece of code, you will probably leave it alone as soon as it passes the tests and you are confident in the solution. If at some point, the dot() would fail you should write a test that reproduces the error and improve the solution so that it passes the test.

When constructing software for more complex problems, there will very soon be many interacting components and running the tests after modifying one of the components will help you assure that all components still play well together, and spot problems as soon as possible.

At this point we want to produce a git tag of the project:

(.venv) > micc tag
[INFO] Creating git tag v0.0.7 for project ET-dot
[INFO] Done.

The tag is a label for the current code base of our project.

1.3 Improving efficiency

There are times when a correct solution - i.e. a code that solves the problem correctly - is sufficient. Very often, however, there are constraints on the time to solution, and the computing resources (number of cores and nodes, memory, ..) are requested to be used efficiently. Especially in scientific computing and high performance computing, where compute tasks may run for many days using hundreds of compute nodes and resources are to be shared with many researchers, using the resources efficiently is of utmost importance.

However important efficiency may be, it is nevertheless a good strategy for developing a new piece of code, to start out with a simple, even naive implementation in Python, neglecting all efficiency considerations, but focussing on correctness. Python has a reputation of being an extremely productive programming language. Once you have proven the correctness of this first version it can serve as a reference solution to verify the correctness of later efficiency improvements. In addition, the analysis of this version can highlight the sources of inefficiency and help you focus your attention to the parts that really need it.

Timing your code

The simplest way to probe the efficiency of your code is to time it: write a simple script and record how long it takes to execute. Let us first look at the structure of a Python script.

Here’s a script (using the above structure) that computes the dot product of two long arrays of random numbers.

"""file et_dot/prof/run1.py"""
import random
from et_dot import dot

def random_array(n=1000):
    """Initialize an array with n random numbers in [0,1[."""
    # Below we use a list comprehension (a Python idiom for creating a list from an iterable object).
    a = [random.random() for i in range(n)]
    return a

if __name__=='__main__':
    a = random_array()
    b = random_array()
    print(dot(a,b))
    print('-*# done #*-')

We store this file, which we rather simply called run1.py, in a directory prof in the project directory where we intend to keep all our profiling work. You can execute the script from the command line (with the project directory as the current working directory:

(.venv) > python ./prof/run1.py
251.08238559724717
-*# done #*-

Note

As our script does not fix the random number seed, every run has a different outcome.

We are now ready to time our script. There are many ways to achieve this. Here is a particularly good introduction. The et-stopwatch project takes this a little further. We add it as a development dependency of our project:

(.venv) > poetry add et_stopwatch -D
Using version ^0.3.0 for et_stopwatch
Updating dependencies
Resolving dependencies... (0.2s)
Writing lock file
Package operations: 1 install, 0 updates, 0 removals
  - Installing et-stopwatch (0.3.0)
(.venv) >

Note

A development dependency is a package that is not needed for using the package at hand, bit only needed for developing it.

Using the Stopwatch class to time pieces of code is simple:

"""file et_dot/prof/run1.py"""
from et_stopwatch import Stopwatch

...

if __name__=='__main__':
    with Stopwatch(message="init"):
        a = random_array()
        b = random_array()
    with Stopwatch(message="dot "):
        dot(a,b)
    print('-*# done #*-')

When the script is exectuted the two print statements will print the duration of the initalisation of a and b and of the computation of the dot product of a and b. Finally, upon exit the Stopwatch will print the total time.

(.venv) > python ./prof/run1.py
init: 0.000281 s
dot : 0.000174 s
-*# done #*-
>

Note that the initialization phase took longer than the computation. Random number generation is rather expensive.

Comparing to Numpy

As said earlier, our implementation of the dot product is rather naive. If you want to become a good programmer, you should understand that you are probably not the first researcher in need of a dot product implementation. For most linear algebra problems, Numpy provides very efficient implementations. Below the run1.py script adds timing results for the Numpy equivalent of our code.

"""file et_dot/prof/run1.py"""
import numpy as np

...

if __name__=='__main__':
    with Stopwatch(name="et init"):
        a = random_array()
        b = random_array()
    with Stopwatch(name="et dot "):
        dot(a,b)

    with Stopwatch(name="np init"):
        a = np.random.rand(1000)
        b = np.random.rand(1000)
    with Stopwatch(name="np dot "):
        np.dot(a,b)

    print('-*# done #*-')

Obviously, to run this script, we must first install Numpy (again as a development dependency):

(.venv) > poetry add numpy -D
Using version ^1.18.1 for numpy
Updating dependencies
Resolving dependencies... (1.5s)
Writing lock file
Package operations: 1 install, 0 updates, 0 removals
  - Installing numpy (1.18.1)
(.venv) >

Here are the results of the modified script:

(.venv) > python ./prof/run1.py
et init: 0.000252 s
et dot : 0.000219 s
np init: 7.8e-05 s
np dot : 3.2e-05 s
-*# done #*-
>

Obviously, Numpy does significantly better than our naive dot product implementation. The reasons for this improvement are:

  • Numpy arrays are contiguous data structures of floating point numbers, unlike Python’s list. Contiguous memory access is far more efficient.
  • The loop over Numpy arrays is implemented in a low-level programming languange. This allows to make full use of the processors hardware features, such as vectorization and fused multiply-add (FMA).

Conclusion

There are three important generic lessons to be learned from this tutorial:

  1. Always start your projects with a simple and straightforward implementation which can be easily be proven to be correct. Write test code for proving correctness.
  2. Time your code to understand which parts are time consuming and which not. Optimize bottlenecks first and do not waste time optimizing code that does not contribute significantly to the total runtime. Optimized code is typically harder to read and may become a maintenance issue.
  3. Before you write code, in this case our dot product implementation, spent some time searching the internet to see what is already available. Especially in the field of scientific and high performance computing there are many excellent libraries available which are hard to beat. Use your precious time for new stuff.

Tutorial 2: Binary extensions

Suppose for a moment that Numpy did not have a dot product implementation and that the implementation provided in Tutorial-1 is way too slow to be practical for your research project. Consequently, you are forced to accelarate your dot product code in some way or another. There are several approaches for this. Here are a number of interesting links covering them:

Most of these approaches do not require special support from Micc to get you going, and we encourage you to go try out the High Performance Python series 1-3 for the ET-dot project. Two of the approaches discussed involve rewriting your code in Modern Fortran or C++ and generate a shared library that can be imported in Python just as any Python module. Such shared libraries are called binary extension modules. Constructing binary extension modules is by far the most scalable and flexible of all current acceleration strategies, as these languages are designed to squeeze the maximum of performance out of a CPU. However, figuring out how to make this work is a bit of a challenge, especially in the case of C++.

This is in fact one of the main reasons why Micc was designed: facilitating the construction of binary extension modules and enabling the developer to create high performance tools with ease.

2.1 Binary extensions in Micc projects

Micc provides boilerplate code for binary extensions as well as some practical wrappers around top-notch tools for building binary extensions from Fortran and C++. Fortran code is compiled into a Python module using f2py (which comes with Numpy). For C++ we use Pybind11 and CMake.

Adding a binary extension is as simple as:

> micc add foo --f90   # add a binary extension 'foo' written in (Modern) Fortran
> micc add bar --cpp   # add a binary extension 'bar' written in C++

Note

For the micc add command to be valid, your project must have a package structure (see `Modules and packages`_).

Enter your own code in the generated source code files and execute :

(.venv) > micc-build

Note

The virtual environment must be activated to execute the micc-build command (see `Virtual environments`_).

If there are no syntax errors your binary extensions will be built, and you will be able to import the modules foo and bar in your project and use their subroutines and functions. Because foo and bar are submodules of your micc project, you must import them as:

import my_package.foo
import my_package.bar

# call foofun in my_package.foo
my_package.foo.foofun(...)

# call barfun in my_package.bar
my_package.bar.barfun(...)

where my_package is the name of the top package of your micc project.

Choosing between Fortran and C++ for binary extension modules

Here are a number of arguments that you may wish to take into account for choosing the programming language for your binary extension modules:

  • Fortran is a simpler language than C++
  • It is easier to write efficient code in Fortran than C++.
  • C++ is a much more expressive language
  • C++ comes with a huge standard library, providing lots of data structures and algorithms that are hard to match in Fortran. If the standard library is not enough, there is also the highly recommended Boost libraries and many other domain specific libraries. There are also domain specific libraries in Fortran, but the amount differs by an order of magnitude at least.
  • With Pybind11 you can almost expose anything from the C++ side to Python, not just functions.
  • Modern Fortran is (imho) not as good documented as C++. Useful place to look for language features and idioms are:

In short, C++ provides much more possibilities, but it is not for the novice. As to my own experience, I discovered that working on projects of moderate complexity I progressed significantly faster using Fortran rather than C++, despite the fact that my knowledge of Fortran is quite limited compared to C++. However, your mileage may vary.

2.2 Building binary extensions from Fortran

Binary extension modules based on Fortran are called f90 modules. Micc uses the f2py tool to build these binary extension modules from Fortran. F2py is part of Numpy.

Note

To be able to add a binary extension module (as well as any other component supported by micc, such as Python modules or CLI applications) to a micc project, your project must have a package structure. This is easily checked by running the micc info command:

> micc info
Project ET-dot located at /home/bert/software/workspace/ET-dot
  package: et_dot
  version: 0.0.0
  structure: et_dot/__init__.py (Python package)
>

If it does, the structure line of the output will read as above. If, however, the structure line reads:

structure: et_dot.py (Python module)

you should convert it by running:

> micc convert-to-package --overwrite

See `Modules and packages`_ for details.

We are now ready to create a f90 module for a Fortran implementation fof the dot product, say dotf, where the f, obviously, stands for Fortran:

> micc add dotf --f90
[INFO]           [ Adding f90 module dotf to project ET-dot.
[INFO]               - Fortran source in       ET-dot/et_dot/f90_dotf/dotf.f90.
[INFO]               - Python test code in     ET-dot/tests/test_f90_dotf.py.
[INFO]               - module documentation in ET-dot/et_dot/f90_dotf/dotf.rst (in restructuredText format).
[WARNING]            Dependencies added. Run \'poetry update\' to update the project\'s virtual environment.
[INFO]           ] done.

The output tells us where to enter the Fortran source code, the test code and the documentation. Enter the Fortran implementation of the dot product below in the Fortran source file ET-dot/et_dot/f90_dotf/dotf.f90 (using your favourite editor or an IDE):

function dotf(a,b,n)
  ! Compute the dot product of a and b
  !
    implicit none
  !-------------------------------------------------------------------------------------------------
    integer*4              , intent(in)    :: n
    real*8   , dimension(n), intent(in)    :: a,b
    real*8                                 :: dotf
  !-------------------------------------------------------------------------------------------------
  ! declare local variables
    integer*4 :: i
  !-------------------------------------------------------------------------------------------------
    dotf = 0.
    do i=1,n
        dotf = dotf + a(i) * b(i)
    end do
end function dotf

The output of the micc add dotf --f90 command above also shows a warning:

[WARNING]            Dependencies added. Run `poetry update` to update the project's virtual environment.

Micc is telling you that it added some dependencies to your project. In order to be able to build the binary extension dotf these dependencies must be installed in the virtual environment of our project by running poetry update.

> poetry update
Updating dependencies
Resolving dependencies... (2.5s)

Writing lock file


Package operations: 40 installs, 0 updates, 0 removals

  - Installing certifi (2019.11.28)
  - Installing chardet (3.0.4)
  - Installing idna (2.8)
  - Installing markupsafe (1.1.1)
  - Installing python-dateutil (2.8.1)
  - Installing pytz (2019.3)
  - Installing urllib3 (1.25.7)
  - Installing alabaster (0.7.12)
  - Installing arrow (0.15.4)
  - Installing babel (2.7.0)
  - Installing docutils (0.15.2)
  - Installing imagesize (1.1.0)
  - Installing jinja2 (2.10.3)
  - Installing pygments (2.5.2)
  - Installing requests (2.22.0)
  - Installing snowballstemmer (2.0.0)
  - Installing sphinxcontrib-applehelp (1.0.1)
  - Installing sphinxcontrib-devhelp (1.0.1)
  - Installing sphinxcontrib-htmlhelp (1.0.2)
  - Installing sphinxcontrib-jsmath (1.0.1)
  - Installing sphinxcontrib-qthelp (1.0.2)
  - Installing sphinxcontrib-serializinghtml (1.1.3)
  - Installing binaryornot (0.4.4)
  - Installing click (7.0)
  - Installing future (0.18.2)
  - Installing jinja2-time (0.2.0)
  - Installing pbr (5.4.4)
  - Installing poyo (0.5.0)
  - Installing sphinx (2.2.2)
  - Installing whichcraft (0.6.1)
  - Installing cookiecutter (1.6.0)
  - Installing semantic-version (2.8.3)
  - Installing sphinx-click (2.3.1)
  - Installing sphinx-rtd-theme (0.4.3)
  - Installing tomlkit (0.5.8)
  - Installing walkdir (0.4.1)
  - Installing et-micc (0.10.10)
  - Installing numpy (1.17.4)
  - Installing pybind11 (2.4.3)
  - Installing et-micc-build (0.10.10)

Note from the last lines in the output that micc-build, which is a companion of Micc that encapsulates the machinery that does the hard work of building the binary extensions, depends on pybind11, Numpy, and on micc itself. As a consaequence, micc is now also installed in the projects virtual environment. Therefore, when the project’s virtual environment is activated, the active micc is the one in the project’s virtual environment:

> source .venv/bin/activate
(.venv) > which micc
path/to/ET-dot/.venv/bin/micc
(.venv) >

We might want to increment the minor component of the version string by now:

(.venv) > micc version -m
[INFO]           (ET-dot)> micc version (0.0.7) -> (0.1.0)

The binary extension module can now be built:

(.venv) > micc-build
[INFO] [ Building f90 module dotf in directory '/Users/etijskens/software/dev/workspace/ET-dot/et_dot/f90_dotf/build_'
...
[DEBUG]          >>> shutil.copyfile( 'dotf.cpython-37m-darwin.so', '/Users/etijskens/software/dev/workspace/ET-dot/et_dot/dotf.cpython-37m-darwin.so' )
[INFO] ] done.
[INFO] Check /Users/etijskens/software/dev/workspace/ET-dot/micc-build-f90_dotf.log for details.
[INFO] Binary extensions built successfully:
[INFO] - ET-dot/et_dot/dotf.cpython-37m-darwin.so
(.venv) >

This command produces a lot of output, most of which is rather uninteresting - except in the case of errors. At the end is a summary of all binary extensions that have been built, or failed to build. If the source file does not have any syntax errors, you will see a file like dotf.cpython-37m-darwin.so in directory ET-dot/et_dot:

(.venv) > ls -l et_dot
total 8
-rw-r--r--  1 etijskens  staff  720 Dec 13 11:04 __init__.py
drwxr-xr-x  6 etijskens  staff  192 Dec 13 11:12 f90_dotf/
lrwxr-xr-x  1 etijskens  staff   92 Dec 13 11:12 dotf.cpython-37m-darwin.so@ -> path/to/ET-dot/et_dot/f90_foo/foo.cpython-37m-darwin.so

Note

The extension of the module dotf.cpython-37m-darwin.so will depend on the Python version (c.q. 3.7) you are using, and on your operating system (c.q. MacOS).

Since our binary extension is built, we can test it. Here is some test code. Enter it in file ET-dot/tests/test_f90_dotf.py:

# import the binary extension and rename the module locally as f90
import et_dot.dotf as f90
import numpy as np

def test_dotf_aa():
    a = np.array([0,1,2,3,4],dtype=np.float)
    expected = np.dot(a,a)
    a_dotf_a = f90.dotf(a,a)
    assert a_dotf_a==expected

The astute reader will notice the magic that is happening here: a is a numpy array, which is passed as is to our et_dot.dotf.dotf() function in our binary extension. An invisible wrapper function will check the types of the numpy arrays, retrieve pointers to the memory of the numpy arrays and feed those pointers into our Fortran function, the result of which is stored in a Python variable a_dotf_a. If you look carefully at the output of ``micc-build`, you will see information about the wrappers that f2py constructed.

Passing Numpy arrays directly to Fortran routines is extremely productive. Many useful Python packages use numpy for arrays, vectors, matrices, linear algebra, etc. By being able to pass Numpy arrays directly into your own number crunching routines relieves you from conversion between array types. In addition you can do the memory management of your arrays and their initialization in Python.

As you can see we test the outcome of dotf against the outcome of numpy.dot(). We thrust that outcome, but beware that this test may be susceptible to round-off error because the representation of floating point numbers in Numpy and in Fortran may differ slightly.

Here is the outcome of pytest:

> pytest
================================ test session starts =================================
platform darwin -- Python 3.7.4, pytest-4.6.5, py-1.8.0, pluggy-0.13.0
rootdir: /Users/etijskens/software/dev/workspace/ET-dot
collected 8 items

tests/test_et_dot.py .......                                                   [ 87%]
tests/test_f90_dotf.py .                                                       [100%]

============================== 8 passed in 0.16 seconds ==============================
>

All our tests passed. Of course we can extend the tests in the same way as we did for the naive Python implementation in the previous tutorial. We leave that as an exercise to the reader.

Increment the version string and produce tag:

(.venv) > micc version -p -t
[INFO]           (ET-dot)> micc version (0.1.0) -> (0.1.1)
[INFO]           Creating git tag v0.1.1 for project ET-dot
[INFO]           Done.

Note

If you put your subroutines and functions inside a Fortran module, as in:

MODULE my_f90_module
  implicit none
  contains
    function dot(a,b)
      ...
    end function dot
END MODULE my_f90_module

then the binary extension module will expose the Fortran module name my_f90_module which in turn exposes the function/subroutine names:

>>> import et_dot
>>> a = [1.,2.,3.]
>>> b = [2.,2.,2.]
>>> et_dot.dot(a,b)
>>> AttributeError
Module et_dot has no attribute 'dot'.
>>> et_dot.my_F90_module.dot(a,b)
12.0

If you are bothered by having to type et_dot.my_F90_module. every time, use this trick:

>>> import et_dot
>>> f90 = et_dot.my_F90_module
>>> f90.dot(a,b)
12.0
>>> fdot = et_dot.my_F90_module.dot
>>> fdot(a,b)
12.0

2.3 Building binary extensions from C++

To illustrate building binary extension modules from C++ code, let us also create a C++ implementation for the dot product. Such modules are called cpp modules. Analogously to our dotf module we will call the cpp module dotc, the c referring to C++.

Note

To add binary extension modules to a project, it must have a package structure. To check, you may run the micc info command and verify the structure line. If it mentions Python module, you must convert the structure by running micc convert-to-package --overwrite. See `Modules and packages`_ for details.

Use the micc add command to add a cpp module:

> micc add dotc --cpp
[INFO]           [ Adding cpp module dotc to project ET-dot.
[INFO]               - C++ source in           ET-dot/et_dot/cpp_dotc/dotc.cpp.
[INFO]               - module documentation in ET-dot/et_dot/cpp_dotc/dotc.rst (in restructuredText format).
[INFO]               - Python test code in     ET-dot/tests/test_cpp_dotc.py.
[WARNING]            Dependencies added. Run \'poetry update\' to update the project\'s virtual environment.
[INFO]           ] done.

The output explains you where to add the C++ source code, the test code and the documentation. First take care of the warning:

(.venv) > poetry update
Updating dependencies
Resolving dependencies... (1.7s)
No dependencies to install or update

Typically, there will be nothing to install, because micc-build was already installed when we added the Fortran module dotf (see 2.2 Building binary extensions from Fortran). Sometimes one of the packages you depend on may just have seen a new release and poetry will perform an upgrade:

(.venv) > poetry update
Updating dependencies
Resolving dependencies... (1.6s)
Writing lock file
Package operations: 0 installs, 1 update, 0 removals
  - Updating zipp (0.6.0 -> 1.0.0)
(.venv) >

Micc uses pybind11 to create Python wrappers for C++ functions. This is by far the most practical choice for this (see https://channel9.msdn.com/Events/CPP/CppCon-2016/CppCon-2016-Introduction-to-C-python-extensions-and-embedding-Python-in-C-Apps for a good overview of this topic). It has a lot of ‘automagical’ features, and it has a header-only C++ library - so, thus effectively preventing installation problems. Boost.Python offers very similar features, but is not header-only and its library depends on the python version you want to use - so you need a different library for every Python version you want to use.

This is a good point to increment the minor component of the version string:

(.venv) > micc version -m
[INFO]           (ET-dot)> micc version (0.1.1) -> (0.2.0)

Enter this code in the C++ source file ET-dot/et_dot/cpp_dotc/dotc.cpp

#include <pybind11/pybind11.h>
#include <pybind11/numpy.h>

double
dotc( pybind11::array_t<double> a
    , pybind11::array_t<double> b
    )
{
    auto bufa = a.request()
       , bufb = b.request()
       ;
 // verify dimensions and shape:
    if( bufa.ndim != 1 || bufb.ndim != 1 ) {
        throw std::runtime_error("Number of dimensions must be one");
    }
    if( (bufa.shape[0] != bufb.shape[0]) ) {
        throw std::runtime_error("Input shapes must match");
    }
 // provide access to raw memory
 // because the Numpy arrays are mutable by default, py::array_t is mutable too.
 // Below we declare the raw C++ arrays for x and y as const to make their intent clear.
    double const *ptra = static_cast<double const *>(bufa.ptr);
    double const *ptrb = static_cast<double const *>(bufb.ptr);

    double d = 0.0;
    for (size_t i = 0; i < bufa.shape[0]; i++)
        d += ptra[i] * ptrb[i];

    return d;
}

// describe what goes in the module
PYBIND11_MODULE(dotc, m)
{// optional module docstring:
    m.doc() = "pybind11 dotc plugin";
 // list the functions you want to expose:
 // m.def("exposed_name", function_pointer, "doc-string for the exposed function");
    m.def("dotc", &dotc, "The dot product of two arrays 'a' and 'b'.");
}

Obviously the C++ source code is more involved than its Fortran equivalent in the previous section. This is because f2py is a program performing clever introspection into the Fortran source code, whereas pybind11 is nothing but a C++ template library. As such it is not capable of introspection and the user is obliged to use pybind11 for accessing the arguments passed in by Python.

We can now build the module. Because we do not want to rebuild the dotf module we add -m dotc to the command line, to indicate that only module dotc must be built:

(.venv)> micc build -m dotc
 [INFO] [ Building cpp module 'dotc':
 [DEBUG]          [ > cmake -D PYTHON_EXECUTABLE=/Users/etijskens/software/dev/workspace/tmp/ET-dot/.venv/bin/python -D pybind11_DIR=/Users/etijskens/software/dev/workspace/tmp/ET-dot/.venv/lib/python3.7/site-packages/et_micc_build/cmake_tools -D CMAKE_BUILD_TYPE=RELEASE ..
 [DEBUG]              (stdout)
                        -- The CXX compiler identification is AppleClang 11.0.0.11000033
                        -- Check for working CXX compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++
                        -- Check for working CXX compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ -- works
                        -- Detecting CXX compiler ABI info
                        -- Detecting CXX compiler ABI info - done
                        -- Detecting CXX compile features
                        -- Detecting CXX compile features - done
                        -- Found PythonInterp: /Users/etijskens/software/dev/workspace/tmp/ET-dot/.venv/bin/python (found version "3.7.5")
                        -- Found PythonLibs: /Users/etijskens/.pyenv/versions/3.7.5/lib/libpython3.7m.a
                        -- Performing Test HAS_CPP14_FLAG
                        -- Performing Test HAS_CPP14_FLAG - Success
                        -- Performing Test HAS_FLTO
                        -- Performing Test HAS_FLTO - Success
                        -- LTO enabled
                        -- Configuring done
                        -- Generating done
                        -- Build files have been written to: /Users/etijskens/software/dev/workspace/tmp/ET-dot/et_dot/cpp_dotc/_cmake_build
 [DEBUG]          ] done.
 [DEBUG]          [ > make
 [DEBUG]              (stdout)
                        Scanning dependencies of target dotc
                        [ 50%] Building CXX object CMakeFiles/dotc.dir/dotc.cpp.o
                        [100%] Linking CXX shared module dotc.cpython-37m-darwin.so
                        [100%] Built target dotc
 [DEBUG]          ] done.
 [DEBUG]          >>> os.remove(/Users/etijskens/software/dev/workspace/tmp/ET-dot/et_dot/cpp_dotc/dotc.cpython-37m-darwin.so)
 [DEBUG]          >>> shutil.copyfile( '/Users/etijskens/software/dev/workspace/tmp/ET-dot/et_dot/cpp_dotc/_cmake_build/dotc.cpython-37m-darwin.so', '/Users/etijskens/software/dev/workspace/tmp/ET-dot/et_dot/cpp_dotc/dotc.cpython-37m-darwin.so' )
 [DEBUG]          [ > ln -sf /Users/etijskens/software/dev/workspace/tmp/ET-dot/et_dot/cpp_dotc/dotc.cpython-37m-darwin.so /Users/etijskens/software/dev/workspace/tmp/ET-dot/et_dot/cpp_dotc/dotc.cpython-37m-darwin.so
 [DEBUG]          ] done.
 [INFO] ] done.
 [INFO]           Binary extensions built successfully:
 [INFO]           - /Users/etijskens/software/dev/workspace/tmp/ET-dot/et_dot/dotc.cpython-37m-darwin.so
 (.venv)   >

The output shows that first CMake is called, followed by make and the installation of the binary extension with a soft link. Finally, lists of modules that have been built successfully, and modules that failed to build are output.

As usual the micc-build command produces a lot of output, most of which is rather uninteresting - except in the case of errors. If the source file does not have any syntax errors, and the build did not experience any problems, you will see a file like dotf.cpython-37m-darwin.so in directory ET-dot/et_dot:

(.venv) > ls -l et_dot
total 8
-rw-r--r--  1 etijskens  staff  1339 Dec 13 14:40 __init__.py
drwxr-xr-x  4 etijskens  staff   128 Dec 13 14:29 __pycache__/
drwxr-xr-x  7 etijskens  staff   224 Dec 13 14:43 cpp_dotc/
lrwxr-xr-x  1 etijskens  staff    93 Dec 13 14:43 dotc.cpython-37m-darwin.so@ -> /Users/etijskens/software/dev/workspace/tmp/ET-dot/et_dot/cpp_dotc/dotc.cpython-37m-darwin.so
lrwxr-xr-x  1 etijskens  staff    94 Dec 13 14:27 dotf.cpython-37m-darwin.so@ -> /Users/etijskens/software/dev/workspace/tmp/ET-dot/et_dot/f2py_dotf/dotf.cpython-37m-darwin.so
drwxr-xr-x  6 etijskens  staff   192 Dec 13 14:43 f90_dotf/
(.venv) >

Note

The extension of the module dotc.cpython-37m-darwin.so will depend on the Python version you are using, and on the operating system.

Although we haven’t tested dotc, this is a good point to increment the version string:

(.venv) > micc version -p
[INFO]           (ET-dot)> micc version (0.2.0) -> (0.2.1)

Here is the test code. It is almost exactly the same as that for the f90 module dotf, except for the module name. Enter the test code in ET-dot/tests/test_cpp_dotc.py:

import et_dot.dotc as cpp    # import the binary extension
import numpy as np

def test_dotc_aa():
    a = np.array([0,1,2,3,4],dtype=np.float)
    expected = np.dot(a,a)
    a_dotc_a = cpp.dotc(a,a)
    assert a_dotc_a==expected

The conversion between the Numpy arrays to C++ arrays is here less magical, as the user must provide code to do the conversion of Python variables to C++. This has the advantage of showing the mechanics of the conversion more clearly, but it also leaves more space for mistakes, and to beginners it may seem more complicated.

Finally, run pytest:

> pytest
================================ test session starts =================================
platform darwin -- Python 3.7.4, pytest-4.6.5, py-1.8.0, pluggy-0.13.0
rootdir: /Users/etijskens/software/dev/workspace/ET-dot
collected 9 items

tests/test_cpp_dotc.py .                                                       [ 11%]
tests/test_et_dot.py .......                                                   [ 88%]
tests/test_f90_dotf.py .                                                       [100%]

============================== 9 passed in 0.28 seconds ==============================

All our tests passed, which is a good reason to increment the version string and create a tag:

(.venv) > micc version -m -t
[INFO] Creating git tag v0.3.0 for project ET-dot
[INFO] Done.

2.4 Data type issues

An important point of attention when writing binary extension modules - and a common source of problems - is that the data types of the variables passed in from Python must match the data types of the Fortran or C++ routines.

Here is a table with the most relevant numeric data types in Python, Fortran and C++.

kind Numpy/Python Fortran C++
unsigned integer uint32 N/A signed long int
unsigned integer uint64 N/A signed long long int
signed integer int32 integer*4 signed long int
signed integer int64 integer*8 signed long long int
floating point float32 real*4 float
floating point float64 real*8 double
complex complex64 complex*4 std::complex<float>
complex complex128 complex*8 std::complex<double>

F2py

F2py is very flexible with respect to data types. In between the Fortran routine and Python call is a wrapper function which translates the function call, and if it detects that the data type on the Python sides and the Fortran sideare different, the wrapper function is allowed to copy/convert the variable when passing it to Fortran routine both, and also when passing the result back from the Fortran routine to the Python caller. When the input/output variables are large arrays copy/conversion operations can have a detrimental effect on performance and this is in HPC highly undesirable. Micc runs f2py with the -DF2PY_REPORT_ON_ARRAY_COPY=1 option. This causes your code to produce a warning everytime the wrapper decides to copy an array. Basically, this warning means that you have to modify your Python data structure to have the same data type as the Fortran source code, or vice versa.

Returning large data structures

The result of a Fortran function and a C++ function is always copied back to the Python variable that will hold it. As copying large data structures is detrimental to performance this shoud be avoided. The solution to this problem is to write Fortran functions or subroutines and C++ functions that accept the result variable as an argument and modify it in place, so that the copy operaton is avoided. Consider this example of a Fortran subroutine that computes the sum of two arrays. are some examples of array addition:

subroutine add(a,b,sumab,n)
  ! Compute the sum of arrays a and b and overwrite array sumab with the result
    implicit none

    integer*4              , intent(in)    :: n
    real*8   , dimension(n), intent(in)    :: a,b
    real*8   , dimension(n), intent(inout) :: sumab

  ! declare local variables
    integer*4 :: i

    do i=1,n
        sumab(i) = a(i) + b(i)
    end do
end subroutine add

The crucial issue here is that the result array sumab has intent(inout). If you qualify the intent of sumab as in you will not be able to overwrite it, whereas - surprisingly - qualifying it with intent(out) will force f2py to consider it as a left hand side variable, which implies copying the result on returning.

The code below does exactly the same but uses a function, not to return the result of the computation, but an error code.

function add(a,b,sumab,n)
  ! Compute the sum of arrays a and b and overwrite array sumab with the result
    implicit none

    integer*4              , intent(in)    :: n,add
    real*8   , dimension(n), intent(in)    :: a,b
    real*8   , dimension(n), intent(inout) :: sumab

  ! declare local variables
    integer*4 :: i

    do i=1,n
        sumab(i) = a(i) + b(i)
    end do

    add = ... ! set return value, e.g. an error code.

end function add

The same can be accomplished in C++:

#include <pybind11/pybind11.h>
#include <pybind11/numpy.h>

namespace py = pybind11;

void
add ( py::array_t<double> a
    , py::array_t<double> b
    , py::array_t<double> sumab
    )
{// request buffer description of the arguments
    auto buf_a = a.request()
       , buf_b = b.request()
       , buf_sumab = sumab.request()
       ;
    if( buf_a.ndim != 1
     || buf_b.ndim != 1
     || buf_sumab.ndim != 1 )
    {
        throw std::runtime_error("Number of dimensions must be one");
    }

    if( (buf_a.shape[0] != buf_b.shape[0])
     || (buf_a.shape[0] != buf_sumab.shape[0]) )
    {
        throw std::runtime_error("Input shapes must match");
    }
 // because the Numpy arrays are mutable by default, py::array_t is mutable too.
 // Below we declare the raw C++ arrays for a and b as const to make their intent clear.
    double const *ptr_a     = static_cast<double const *>(buf_a.ptr);
    double const *ptr_b     = static_cast<double const *>(buf_b.ptr);
    double       *ptr_sumab = static_cast<double       *>(buf_sumab.ptr);

    for (size_t i = 0; i < buf_a.shape[0]; i++)
        ptr_sumab[i] = ptr_a[i] + ptr_b[i];
}


PYBIND11_MODULE({{ cookiecutter.module_name }}, m)
{// optional module doc-string
    m.doc() = "pybind11 {{ cookiecutter.module_name }} plugin"; // optional module docstring
 // list the functions you want to expose:
 // m.def("exposed_name", function_pointer, "doc-string for the exposed function");
    m.def("add", &add, "A function which adds two arrays 'a' and 'b' and stores the result in the third, 'sumab'.");
}

Here, care must be taken that when casting buf_sumab.ptr one does not cast to const.

2.5 Specifying compiler options for binary extension modules

[ Advanced Topic ] As we have seen, binary extension modules can be programmed in Fortran and C++. Micc provides convenient wrappers to build such modules. Fortran source code is transformed to a python module using f2py, and C++ source using Pybind11 and CMake. Obviously, in both cases there is a compiler under the hood doing the hard work. By default these tools use the compiler they find on the path, but you may as well specify your favorite compiler.

Note

Compiler options are distinct for f2py modules and cpp modules.

Building a single module only

If you want to build a single binary extension module rather than all binary extension modules in the project, add the -m|--module option:

This will only build module my_module.

Performing a clean build

To perform a clean build, add the --clean flag to the micc build command:

This will remove the previous build directory and as well as the binary extension module.

Controlling the build of f90 modules

To specify the Fortran compiler, e.g. the GNU fortran compiler:

Note, that this exactly how you would have specified it using f2py directly. You can specify the Fortran compiler options you want using the --f90flags option:

In addition f2py (and micc-build for that matter) provides two extra options --opt for specifying optimization flags, and --arch for specifying architecture dependent optimization flags. These flags can be turned off by adding --noopt and --noarch, respectively. This can be convenient when exploring compile options. Finally, the --debug flag adds debug information during the compilation.

Micc_ build also provides a --build-type options which accepts release and debug as value (case insensitive). Specifying debug is equivalent to --debug --noopt --noarch.

Note

ALL f90 modules are built with the same options. To specify separate options for a particular module use the -m|--module option.

Note

Although there are some commonalities between the compiler options of the various compilers, you will most probably have to change the compiler options when you change the compiler.

Controlling the build of cpp modules

The build of C++ modules can be fully controlled by modifying the the module’s CMakeLists.txt file to your needs. Micc provides every cpp module with a template containing examples of frequently used CMake commands commented out. These include the specification of :

  • compiler options
  • preprocessor macros
  • include directories
  • link directories
  • link libraries

You just need to uncomment them and provide the values you need:

# ...

# set compiler:
# set(CMAKE_CXX_COMPILER path/to/executable)

# Add compiler options:
# set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} <additional C++ compiler options>")

# Add preprocessor macro definitions:
# add_compile_definitions(
#     OPENFOAM=1912                     # set value
#     WM_LABEL_SIZE=$ENV{WM_LABEL_SIZE} # set value from environment variable
#     WM_DP                             # just define the macro
# )

# Add include directories
#include_directories(
#     path/to/dir1
#     path/to/dir2
# )

#...

CMake provides default build options for four build types: DEBUG, MINSIZEREL, RELEASE, and RELWITHDEBINFO.

  • CMAKE_CXX_FLAGS_DEBUG: -g
  • CMAKE_CXX_FLAGS_MINSIZEREL: -Os -DNDEBUG
  • CMAKE_CXX_FLAGS_RELEASE: -O3 -DNDEBUG
  • CMAKE_CXX_FLAGS_RELWITHDEBINFO: -O2 -g -DNDEBUG

The build type is selected by setting the CMAKE_BUILD_TYPE variable (default: RELEASE).

For convenience, micc-build provides a command line argument --build-type for specifying the build type.

Save and load build options to/from file

With the --save option you can save the current build options to a file in .json format. This acts on a per project basis. E.g.:

will save the <my build options> to the file build.json in every binary module directory (the .json extension is added if omitted). You can restrict this to a single module with the --module option (see above). The saved options can be reused in a later build as:

2.6 Documenting binary extension modules

For Python modules the documentation is automatically extracted from the doc-strings in the module. However, when it comes to documenting binary extension modules, this does not seem a good option. Ideally, the source files ET-dot/et_dot/f90_dotf/dotf.f90 amnd ET-dot/et_dot/cpp_dotc/dotc.cpp should document the Fortran functions and subroutines, and C++ functions, respectively, rahter than the Python interface. Yet from the perspective of ET-dot being a Python project, the users is only interested in the documentation of the Python interface to those functions and subroutines. Therefore, micc requires you to document the Python interface in separate .rst files:

  • ET-dot/et_dot/f90_dotf/dotf.rst
  • ET-dot/et_dot/cpp_dotc/dotc.rst

Here are the contents, respectively, for ET-dot/et_dot/f90_dotf/dotf.rst:

Module et_dot.dotf
******************

Module :py:mod:`dotf` built from fortran code in :file:`f90_dotf/dotf.f90`.

.. function:: dotf(a,b)
   :module: et_dot.dotf

   Compute the dot product of *a* and *b* (in Fortran.)

   :param a: 1D Numpy array with ``dtype=numpy.float64``
   :param b: 1D Numpy array with ``dtype=numpy.float64``
   :returns: the dot product of *a* and *b*
   :rtype: ``numpy.float64``

and for ET-dot/et_dot/cpp_dotc/dotc.rst:

Module et_dot.dotc
******************

Module :py:mod:`dotc` built from fortran code in :file:`cpp_dotc/dotc.cpp`.

.. function:: dotc(a,b)
   :module: et_dot.dotc

   Compute the dot product of *a* and *b* (in C++.)

   :param a: 1D Numpy array with ``dtype=numpy.float64``
   :param b: 1D Numpy array with ``dtype=numpy.float64``
   :returns: the dot product of *a* and *b*
   :rtype: ``numpy.float64``

Note that the documentation must be entirely in .rst format (see restructuredText).

Build the documentation:

(.venv) > cd docs && make html
Already installed: click
Already installed: sphinx-click
Already installed: sphinx
Already installed: sphinx-rtd-theme
Running Sphinx v2.2.2
making output directory... done
WARNING: html_static_path entry '_static' does not exist
building [mo]: targets for 0 po files that are out of date
building [html]: targets for 7 source files that are out of date
updating environment: [new config] 7 added, 0 changed, 0 removed
reading sources... [100%] readme
looking for now-outdated files... none found
pickling environment... done
checking consistency... /Users/etijskens/software/dev/workspace/tmp/ET-dot/docs/apps.rst: WARNING: document isn't included in any toctree
done
preparing documents... done
writing output... [100%] readme
generating indices...  genindex py-modindexdone
highlighting module code... [100%] et_dot.dotc
writing additional pages...  search/Users/etijskens/software/dev/workspace/tmp/ET-dot/.venv/lib/python3.7/site-packages/sphinx_rtd_theme/search.html:20: RemovedInSphinx30Warning: To modify script_files in the theme is deprecated. Please insert a <script> tag directly in your theme instead.
  {{ super() }}
done
copying static files... ... done
copying extra files... done
dumping search index in English (code: en)... done
dumping object inventory... done
build succeeded, 2 warnings.

The HTML pages are in _build/html.

The documentation is built using make. The Makefile checks that the necessary components sphinx, click, sphinx-click_and sphinx-rtd-theme are installed.

You can view the result in your favorite browser:

(.venv) > open _build/html/index.html

The filepath is made evident from the last output line above. This is what the result looks like (html):

_images/img2-1.png

Increment the version string:

(.venv) > micc version -M -t [ERROR] Not a project directory (/Users/etijskens/software/dev/workspace/tmp/ET-dot/docs). (.venv) > cd .. (.venv) > micc version -M -t [INFO] (ET-dot)> micc version (0.3.0) -> (1.0.0) [INFO] Creating git tag v1.0.0 for project ET-dot [INFO] Done.

Note that we first got an error because we are still in the docs directory, and not in the project root directory.

Tutorial 3: Adding Python components

3.1 Adding a Python module

Just as one can add binary extension modules to a package, one can add python modules.

> micc add foo --py

[INFO]           [ Adding python module foo.py to project ET-dot.
[INFO]               - python source in    ET-doc/et_doc/foo.py.
[INFO]               - Python test code in ET-doc/tests/test_foo.py.
[INFO]           ] done.

This adds a Python sub-module to the package, and a test script. The documentation for the sub-module is extracted from doc-strings of the functions and classes in the sub-module.

As with micc create the default structure is that of a simple module, i.e. ET-doc/et_doc/foo.py. If you want a package you can add the --package flag.

Testing the module

When adding a module foo, Micc automacally adds a test script for the new module: tests/test_foo,py. In this file you add tests for module foo.

Documenting the module

When adding a module foo, Micc automatically adds documentation entries in API.rst. Calling micc docs will automatically extract documentation from the doc-strings in your new module.

3.2 Adding a Python Command Line Interface

Command Line Interfaces are Python scripts that you want to be installed as executable programs when a user installs your package.

As an example, assume that we need quite often to read two arrays from file and compute their dot product, and that we want to execute this operation as:

> dot-files file1 file2
dot(file1,file2) = 123.456
>

Micc supports two kinds of CLIs based on click, a very practical tool for building Python CLIs. The first one is for CLIs that execute a single task, the second one for a command with sub-commands, like git or micc itself. The single task case default, so we can create it like:

> micc app dot-files
[INFO]           [ Adding CLI dot-files without sub-commands to project ET-dot.
[INFO]               - Python source file ET-dot/et_dot/cli_dot-files.py.
[INFO]               - Python test code   ET-dot/tests/test_cli_dot-files.py.
[INFO]           ] done.

For a CLI with sub-commands one should add the flag --sub-commands.

The source code ET-dot/et_dot/cli_dot_files.py should be modified as:

# -*- coding: utf-8 -*-
"""Command line interface dot-files (no sub-commands)."""

import sys

import click
import numpy as np

from et_dot.dotf import dotf

@click.command()
@click.argument('file1')
@click.argument('file2')
@click.option('-v', '--verbosity', count=True
             , help="The verbosity of the CLI."
             , default=1
             )
def main(file1,file2,verbosity):
    """Command line interface dot-files.

    A 'hello' world CLI example.
    """
    a = np.genfromtxt(file1, dtype=np.float64, delimiter=',')
    b = np.genfromtxt(file2, dtype=np.float64, delimiter=',')
    ab = dotf(a,b)
    if verbosity>1:
        print(f"dot-files({file1},{file2}) = {ab}")
    else:
        print(ab)

if __name__ == "__main__":
    sys.exit(main())  # pragma: no cover

Here’s how to use it from the command line (without installing):

> source .venv/bin/activate
(.venv) > cat file1.txt
1,2,3,4,5
> cat file2.txt
2,2,2,2,2
(.venv) > python et_dot/cli_dot_files.py file1.txt file2.txt
30.0
(.venv) > python et_dot/cli_dot_files.py file1.txt file2.txt -vv
dot-files(file1.txt,file2.txt) = 30.0

Testing the application

When you add an a application like dot-files Micc automatically adds a test script tests/test_cli_dot_files.py where you can add your tests. Testing CLIs is a bit more complex than testing modules, but Click provides some tools for Testing click applications. Here is the test code:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

from click.testing import CliRunner

from et_dot.cli_dot_files import main

def test_main():
    runner = CliRunner()
    result = runner.invoke(main, ['file1.txt','file2.txt'])
    print(result.output)
    ab = float(result.output[0:-1])
    assert ab==30.0

Finally, we run pytest:

> pytest
================================= test session starts =================================
platform darwin -- Python 3.7.4, pytest-4.6.5, py-1.8.0, pluggy-0.13.0
rootdir: /Users/etijskens/software/dev/workspace/ET-dot
collected 10 items

tests/test_cli_dot-files.py .                                                   [ 10%]
tests/test_cpp_dotc.py .                                                        [ 20%]
tests/test_et_dot.py .......                                                    [ 90%]
tests/test_f90_dotf.py .                                                       [100%]

================================== 10 passed in 0.33 seconds ==========================

Documenting an application

When adding a CLI, Micc automatically adds documentation entries for it. in APPS.rst. Calling micc docs will automatically extract documentation from the doc-strings of the command and the :param ...: of the click.argument decorators in these doc-strings, and from the help parameters of the click.option decorators.

Tutorial 4: Version control and version management

4.1 Git support

When you create a new project, Micc immediately provides a local git repository for you and commits the initial files Micc set up for you. If you have a github account you can register it in the preferences file ~/.micc/micc.json, using the github_username entry:

{
...
, "github_username"  : {"default":"etijskens"
                       ,"text"   :"your github username"
                       }
...
}

Micc cannot create a remote github repository for you, but if you registered your github username in the preferences file, it will add a remote origin at https://github.com/etijskens/<your_project_name>/, and try to push the files to the github repo. If you have created the remote repository before you create the project, the new project will be immediately pushed onto the remote origin. Otherwise, you get a warning that the remote repository does not yet exist. You can create the remote repository whenever you like and push your work onto the remote repository using the git CLI.

4.2 Version management

Version numbers are practical, even for a small software project used only by yourself. For larger projects, certainly when other users start using them, they become indispensable. When giving version numbers to a project, we highly recommend to follow the guidelines Semantic Versioning 2.0. Such a version number consists of Major.minor.patch. According to semantic versioning you should increment the:

  • Major version when you make incompatible API changes,
  • minor version when you add functionality in a backwards compatible manner, and
  • patch version when you make backwards compatible bug fixes.

Micc sets a version number of 0.0.0 when it creates a project, and you can bump the version number at any time with the micc version command.

> micc info
Project ET-dot located at /Users/etijskens/software/dev/workspace/ET-dot
  package: et_dot
  version: 0.0.0
  structure: et_dot/__init__.py (Python package)
  contents:
    application cli_dot_files.py
    C++ module  cpp_dotc/dotc.cpp
    f90 module  f90_dotf/dotf.f90

To bump the patch component:

> micc version
Project (ET-dot version (0.0.0)
> micc version --patch
[INFO]           bumping version (0.0.0) -> (0.0.1)

Again, with the short version of --patch and verbose this time, :

> micc -vv version -p
[DEBUG] start = 2019-10-16 13:18:16.995416
[INFO]           bumping version (0.0.1) -> (0.0.2)
[DEBUG]          . Updated (/Users/etijskens/software/dev/workspace/ET-dot/pyproject.toml)
[DEBUG]          . Updated (/Users/etijskens/software/dev/workspace/ET-dot/et_dot/__init__.py)
[DEBUG] stop  = 2019-10-16 13:18:17.261962
[DEBUG] spent = 0:00:00.266546

Here, you can see that micc updated the version number in ET-dot/pyproject.toml and ET-dot/et_dot/__init__.py.

To bump the minor component use the --minor or -m flag:

> micc version -m
[INFO]           bumping version (0.0.2) -> (0.1.0)

As you can see the patch component is reset to 0.

To bump the major component use the --major or -M flag:

> micc version -M
[INFO]           bumping version (0.1.0) -> (1.0.0)

As you can see the minor component (as well as the patch component) is reset to 0.

The version number has a --tag flag that creates a git tag (see https://git-scm.com/book/en/v2/Git-Basics-Tagging) and trys

> micc -vv version -p --tag
[DEBUG] start = 2019-10-16 13:37:25.026161
[INFO]           bumping version (1.0.1) -> (1.0.2)
[DEBUG]          . Updated (/Users/etijskens/software/dev/workspace/ET-dot/pyproject.toml)
[DEBUG]          . Updated (/Users/etijskens/software/dev/workspace/ET-dot/et_dot/__init__.py)
[INFO]           Creating git tag v1.0.2 for project ET-dot
[DEBUG]          Running 'git tag -a v1.0.2 -m "tag version 1.0.2"'
[DEBUG]
[DEBUG]          Pushing tag v1.0.2 for project ET-dot
[DEBUG]          Running 'git push origin v1.0.2'
[DEBUG]          remote: Repository not found.
                   fatal: repository 'https://github.com/etijskens/ET-dot/' not found
[INFO]           Done.
[DEBUG] stop  = 2019-10-16 13:37:26.101959
[DEBUG] spent = 0:00:01.075798

If you created a remote github repository for your project and registered your github username in the preferences file, the tag is pushed to the remote origin.

Tutorial 5 - Publishing your code

Publishing your code is an easy way to make your code available to other users.

5.1 Publishing to the Python Package Index

For this we rely on poetry. If you do not have a PyPI account, create one and run this command in your project directory, e.g. et-foo:

Note

It is crucial that your project name is not already taken. For this reason, we recommend that

  1. before you create a project that you might want to publish, you check wether your project name is not already taken.
  2. immediately after your project is created, you publish it, as to reserve the name forever.

Now everyone can install the package in his current Python environment as:

> pip install et-foo

5.2 Publishing packages with binary extension modules

Packages with binary extension modules are published in exactly the same way, that is, as a Python-only project. When you pip install a Micc project the package directory will end up in the site-packages directory of the Python environment in which you install. The source code directories of the binary extensions modules are installed with the package, but without the binary extensions themselves. These must be compiled locally. Fortunately that happens automatically, at least if the binary extension were added to the package by Micc. When Micc adds a binary extension to a project, two thing happen:

  • a dependency on micc-build is added to the project, and
  • in the top-level module <package_name>/__init__.py a try-except block is added that tries to import the binary extension and in case of failure (ModuleNotFoundError) will attempt to build it using the machinery provided by micc-build. This will usually succeed, provided the necessary compilers are available.

As an example, let us create a project foo with a binary extension module bar written in C++

> micc -p Foo create
> cd auto-build
> micc add bar --cpp

This creates this Foo/foo/__init__.py:

# -*- coding: utf-8 -*-
"""
Package foo
===========

Top-level package for foo.
"""

__version__ = 0.0.0

try:
    import foo.bar
except ModuleNotFoundError as e:
    # Try to build this binary extension:
    from pathlib import Path
    import click
    from et_micc_build.cli_micc_build import auto_build_binary_extension
    msg = auto_build_binary_extension(Path(__file__).parent, 'bar')
    if not msg:
        import foo.bar
    else:
        click.secho(msg, fg='bright_red')

def hello(who='world'):
    ...

If the first import foo.bar fails, the except block imports the method auto_build_binary_extension() and executes it arguments the path to the package directory :file`Foo/foo` and the name of the binary extension module bar. If the build succeeds, the msg string is empty and foo.bar is imported at last, otherwise the error message msg is printed.

5.3 Providing auto_build_binary_extension() with custom build parameters

The auto-build above will normally use the default build options, corresponding to -O3, which optimizes for speed. As the auto_build_binary_extension() method is called automatically, we have not many options to set build options. The auto_build_binary_extension() method will look for the existence of a file Foo/foo/cpp_bar/build_options.<platform>.json, where <platform> is Darwin, on MACOSX, Linux` on Linux and ``Windows on Windows. If it exists, it should contain a dict with the build options to use.

Note

The build options files are OS specific:

  • On MacOSX : build_options.Darwin.json
  • On Linux : build_options.Linux.json
  • On Windows : build_options.Windows.json

f90 module build option specifications

All options available to the f2py command line application can be entered in the build file specification. Pure flags, like e.g. --noopt, which are present or not, but have no value, are entered in the dictionary with value None. Below are some examples of much used f2py flags.

import json
from pathlib import Path
import platform

f2py = {
    '--f90exec' : 'f90 compiler executable'
    '--f90flags': 'f90 compiler flags'
    '--opt'     : 'f90 compiler optimization flags'
    '--arch'    : 'f90 compiler architecture specific compiler flags'
    '--noopt'   : None # neglect '--opt' contents
    '--noarch'  : None # neglect '--arch' contents
    '--debug'   : None # compile with debugging information
}
module_srcdir_path = Path(project_path) / package_name / f"f2py_{module_name}"
with (module_srcdir_path / f"build_options.{platform.system()}.json").open('w) as f:
    json.dump(f2py, f)

Note

The Python dictionary f2py is written to file in .json format, which is human readable. You can also construct it with an editor.

Cpp module build option specifications

For cpp binary extension modules the build tool is CMake. Here, the entries of the build options dict consist of any CMake variable and its desired value.

import json
from pathlib import Path
import platform

cmake = {
    'CMAKE_BUILD_TYPE' : 'RELEASE',
    ...
}
module_srcdir_path = Path(project_path) / package_name / f"cpp_{module_name}"
with (module_srcdir_path / f"build_options.{platform.system()}.json").open('w) as f:
    json.dump(cmake, f)

5.4 Publishing your documentation on readthedocs.org

Publishing your documentation to Readthedocs relieves the users of your code from having to build documentation themselves. Making it happen is very easy. First, make sure the git repository of your code is published on Github.Second, create a Readthedocs account if you do not already have one. Then, go to your Readthedocs page, go to your projects and hit import project. Fill in the fields and every time you push commits to Github its documentation will be rebuild automatically and published.

Note

Sphinx must be able to import your project in order to extract the documentation. If your codes depend on Python modules other than the standard library, this will fail and the documentation will not be built. You can add the necessary dependencies to <your-project>/docs/requirements.txt.

This document walks you through the differences of using micc to manage your python+C++/Fortran projects on the VSC clusters. If you are already familiar with the use of HPC environments, the only relevant part of this tutorial is section 6.2 Using Poetry on the cluster. Otherwise, it is recommended to go through the entire tutorial.

Note

This tutorial uses the Leibniz cluster of the University of Antwerp for the examples. The principles pertain, however, to all VSC clusters, and most probably also to other clusters using a module system for exposing its software stack.

Tutorial 6 - Using micc projects on the VSC clusters

Most differences between using your local machine or a cluster stem from the fact that a cluster, typically, uses a module system for making software available to the user on a login node (interactive mode) and to a compute node (batch mode). In addition, the cluster uses a scheduler that determines when your compute jobs are executed.

The tools we need are, typically:

  • a modern Python version. As Python 2.7 is officially discontinued, that would probably be 3.7 or later.
  • common Python packages for computing, like Numpy, scipy, matplotlib, …
  • compilers for C++ and/or Fortran, for compiling binary extensions.
  • CMake, as the build system for C++ binary extensions.
  • git, for version control, if we are developing code on the cluster.

6.1 Using modules

The cluster’s operating system exposes some of these tools, but, they lag many versions behind and, although very reliable, they are not fit for high performance computing purposes.

As an example consider the GCC C++ compiler g++. Here is the g++ version exposed by the operating system (at the day of writing: August 2020):

> which g++
/usr/bin/g++
> g++ --version
g++ (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39)
...

Still at the day of writing, the latest GCC version is 6 major versions ahead of that: 10.2! The OS g++ is very reliable for building operating system components but it is not suited for building C++ binary extensions that must squeeze the last bit of performance out of the cluster’s hardware. Obviously, this old g++ can impossibly be aware of modern hardware, and, consequentially, cannot generate code that exploit all the modern hardware features introduced for improving performance of scientific computations.

Similarly, the OS Python is 2.7.5, whereas 3.9 is almost released, and 2.7.x isn’t even officially supported anymore.

So as a rule of thumb:

Never use the tools provided by the operating system.

As the preinstalled modules are built by VSC specialists for optimal performance on the cluster hardware, this rule should be extended as:

Do not install your own tools (unless they are not performance critical, or, you are a specialist yourself).

If you need some software package, a library, a Python module, or, whatever, which is not available as a cluster module, especially, if it is performance critical, contact your local VSC team and they will build and install it for you (and all other users).

The VSC Team has installed many software packages ready to be used for high performance computing. In fact, they are built using the modern compilers and with optimal performance in mind. Contrary to what you are used to on your personal computer where installed software packages are immediately accessible, on the cluster an extra step must be taken to make installed packages accessible.

If you are unsure whether a command is provided by the operating system or not, use the linux which command:

> which g++
/usr/bin/g++

Typically, commands of the operating system are found in /usr/bin and should usually not be used for high performance computing. Commands provided by some cluster module are, typically, found in /apps/<vsc-site>/....

Note

The which cmd command shows the path to the first cmd on the PATH environment variable.

So, how do we get access to the commands we are supposed to use?

HPC packages are installed as modules and to make them accessible, they must be loaded. Loading a module means that your operating system environment is modified such that it can find the software’s executables, that is, the directories containing its executables are added to the PATH variable. In additio, other environment variables adjusted or added to make everything work smoothly.

E.g., to use a recent version of git we load the git module:

> module load git
> which git
/apps/antwerpen/broadwell/centos7/git/2.13.3/bin/git
> git --version
git version 2.13.3

Before we loaded git, the which command would have shown:

> which git
/usr/bin/git
> git --version
git version 1.8.3.1

A much older version, indeed.

You can search for modules containing e.g. the word gcc (case insensitive):

> module spider gcc
...

If you know the package name, you can list the available versions with module av. Here are the available Python versions (the command is case insensitive):

> module av python/

which on Leibniz returns:

----------------------------------------------------- /apps/antwerpen/modules/centos7/software-broadwell/2019b -----------------------------------------------------
   Biopython/1.74-GCCcore-8.3.0-IntelPython3-2019b    Biopython/1.74-intel-2019b-Python-3.7.4 (D)    Python/3.7.4-intel-2019b (D)
   Biopython/1.74-intel-2019b-Python-2.7.16           Python/2.7.16-intel-2019b

----------------------------------------------------- /apps/antwerpen/modules/centos7/software-broadwell/2018b -----------------------------------------------------
   Python/2.7.15-intel-2018b    Python/3.6.8-intel-2018b    Python/3.7.0-intel-2018b    Python/3.7.1-intel-2018b

----------------------------------------------------- /apps/antwerpen/modules/centos7/software-broadwell/2018a -----------------------------------------------------
   Python/2.7.14-intel-2018a    Python/3.6.4-intel-2018a    Python/3.6.6-intel-2018a

----------------------------------------------------- /apps/antwerpen/modules/centos7/software-broadwell/2017a -----------------------------------------------------
   Biopython/1.68-intel-2017a-Python-2.7.13    pbs_python/4.6.0-intel-2017a-Python-2.7.13    Python/3.6.1-intel-2017a
   Biopython/1.68-intel-2017a-Python-3.6.1     Python/2.7.13-intel-2017a

  Where:
   D:  Default Module

If you need software that is not listed, request it at hpc@uantwerpen.be

Please, mind the last line. If you need something that is not pre-installed, request it at mailto:hpc@antwerpen.be

You can unload a module:

> module unload git
> which git
/usr/bin/git

The current git command is that of the OS again.

You can unload all modules:

> module purge

To learn the details about the VSC clusters’ module system, consult Using the module system.

6.2 Using Poetry on the cluster

6.2.1 Installing Poetry

Poetry is, sofar, not available as a cluster module. You must install it yourself. The installation method recommended by the poetry_documentation is also applicable on the cluster (even when the system Python version is still 2.7.x):

> module purge
> curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python

The module purge command ensures that the system Python is used for the Poetry installation. This allows you to have a single poetry installation that works for all Python versions that you might want to use. So, internally, poetry commands use the system Python which is always available, but your projects can use any Python version that is made avaible by loading a cluster module, or, that you installed yourself.

6.2.2 Using pre-installed Python packages

As the cluster modules generally come with pre-installed Python packages which are built to achieve optimal performance in a HPC environment, e.g. Numpy, Scipy, …) we do not want poetry install to reinstall these packages in your project’s virtual environment. That would lead to suboptimal performance, and waste disk space. Fortunately, there is a way to tell Poetry that it must use pre-installed Python packages:

> mkdir -p ~/.cache/pypoetry/virtualenvs/.venv
> echo 'include-system-site-packages = true' > ~/.cache/pypoetry/virtualenvs/.venv/pyvenv.cfg'

(If the name of your project’s virtual environment is not .venv, replace it with the name of your project’s virtual environment).

6.3 Using micc on the cluster

First, we make sure to load a modern Python version for our project. The VSC clusters have many Python versions available, and come in different flavours, depending on the toolchain that was used to build them. On Leibniz, e.g., we would load:

> module load leibniz/2019b     # unleashed all modules compiled with the intel-2019b toolchain
> module load Python/3.7.4-intel-2019b

This module comes with a number of pre-installed Python package wich you can see using;;

> ll $(dirname which python)/../lib/python3.7/site-packages

The above Python/3.7.4-intel-2019b is a good choice. Usually, loading a Python module will automatically also make the C++ and Fortran compilers available that were used to compile that Python module. They are, obviously, needed for building binary extensions from C++ and Fortran code.

in addition, Micc relies on a number of other software package to do its work.

  • Git, our preferred version control system. The system git is a bit old, hence:

    > module purge
    > git --version
    git version 1.8.3.1 # this is the system git
    > module load git
    > git --version
    git version 2.13.3
    
  • For building binary extensions from C++ we need CMake, hence:

    > cmake --version
    cmake version 2.8.12.2 # this is the system CMake
    > module load leibniz/2019b
    > module load CMake
    > cmake --version
    cmake version 3.11.1
    
  • For building binary extensions from Fortran, we need f2py, which is made available from Numpy. Hence, we need to load a cluster Python module with Numpy pre-installed (please check `7.2.2 Using pre-installed Python packages`_ for this). The above loaded Python version is ok for that.