Welcome to micc’s documentation!

Micc

https://img.shields.io/pypi/v/micc.svg https://img.shields.io/travis/etijskens/micc.svg Documentation Status

Micc is a Python project manager: it helps you organize your Python project from simple single file modules to fully fledged Python packages containing modules, sub-modules, apps and binary extension modules written in Fortran or C++. Micc organizes your project in a way that is considered good practice by a large part of the Python community.

  • Micc helps you create new projects. You can start small with a simple one-file package and add material as you go, such as:

    • Python sub-modules and sub-packages,
    • applications, also known as command line interfaces (CLIs).
    • binary extension modules written in C++ and Fortran. Boiler plate code is automatically added as to build these binary extension with having to go through al the details. This is, in fact, the foremost reason that got me started on this project: For High Performance Python it is essential to rewrite slow and time consuming parts of a Python script or module in a language that is made for High Performance Computing. As figuring out how that can be done, requires quite some effort, Micc was made to automate this part while maintaining the flexibility.
    • Micc adds typically files containing example code to show you how to add your own functionality.
  • You can automatically extract documentation from the doc-strings of your files, and build html documentation that you can consult in your browser, or a .pdf documentation file.

  • With a little extra effort the generated html documentation is automatically published to readthedocs.

  • Micc helps you with version management and control.

  • Micc helps you with testing your code.

  • Micc helps you with publishing your code to e.g. PyPI, so that you colleagues can use your code by simply running:

    > pip install your_nifty_package
    

Credits

Micc does not do all of this by itself. For many things it relies on other strong open source tools and it is therefor open source as well (MIT Licence). Here is a list of tools micc is using or cooperating with happily:

  • Pyenv: management of different Python versions.
  • Pipx for installation of CLIs in a system-wide way.
  • Poetry for dependency management, virtual environment management, packaging and publishing.
  • Git for version control.
  • CMake is usde for building binary extension modules written in C++.

The above tools are not dependencies of Micc and must be installed separately. Then there are a number of python packages on which micc depends and which are automatically installed when poetry creates a virtual environment for a project.

  • Cookiecutter for creating boilerplate code from templates for all the parts that can be added to your project.
  • Python-semanticversion for managing version strings and dependency version constraints according to the Semver 2.0 specification.
  • Pytest for testing your code.
  • Click for a pythonic and intuitive definition of command-line interfaces (CLIs).
  • Sphinx to extract documentation from your project’s doc-strings.
  • Sphinx-click for extracting documentation from the click command descriptions.
  • F2py for transforming modern Fortran code into performant binary extension modules interfacing nicely with Numpy.
  • Pybind11 as the glue between C++ source code and performant binary extension modules, also interfacing nicely with Numpy.

Roadmap

These features are still on our wish list:

  • Deployment on the VSC clusters
  • Contininous integtration (CI)
  • Code style, e.g. flake8 or black
  • Profiling

Installation

It is recommended to install micc_ system-wide with pipx.

> pipx install et-micc

Upgrading to a newer version is done as:

> pipx upgrade et-micc

To install micc in your current Python environment, run this command in your terminal:

> pip install et-micc

Debugging micc and micc-build

To test/debug micc_ or micc-build_ on a specific project, run:

(.venv)> path/to/et-micc/symlink-micc.sh

As indicated, the projects virtual environmentmust be activated. The current working directory is immaterial, though. This command replaces the package folders et_micc and et_micc_build in the projects virtual environment’s site-packages folder with symbolic links to the project module directories et-micc/et_micc and et-micc-build/et_micc_build, so that any changes in those are immediately visible in the project your are working on.

If the project’s virtual environment does not contain the package folders, you get a warning and the suggestion to first install them. Note, that unless micc-build_ is a dependency of your project (because it has binary extensions), micc_ is usually not in the site-packages folder (it is usually installed system-wide).

Productivity tip: put a symbolic link to symlink-micc.sh somewhere on the path.

Usage

To learn how to use micc, execute:

> micc --help

You are also encouraged to study the tutorials.

Applications (CLI)

micc

Micc command line interface.

All commands that change the state of the project produce some output that is send to the console (taking verbosity into account). It is also sent to a logfile et_micc.log in the project directory. All output is always appended to the logfile. If you think the file has gotten too big, or you are no more interested in the history of your project, you can specify the --clear-log flag to clear the logfile before any command is executed. In this way the command you execute is logged to an empty logfile.

See below for (sub)commands.

micc [OPTIONS] COMMAND [ARGS]...

Options

-v, --verbosity

The verbosity of the program output.

-p, --project-path <project_path>

The path to the project directory. The default is the current working directory.

--clear-log

If specified clears the project’s et_micc.log file.

--version

Show the version and exit.

add

Add a module or CLI to the projcect.

param str name:name of the CLI or module added.

If app==True: (add CLI application)

  • app_name is also the name of the executable when the package is installed.
  • The source code of the app resides in <project_name>/<package_name>/cli_<name>.py.

If py==True: (add Python module)

  • Python source in <name>.py*`or :file:`<name>/__init__.py, depending on the package flag.

If f90==True: (add f90 module)

  • Fortran source in f90_<name>/<name>.f90 for f90 binary extension modules.

If cpp==True: (add cpp module)

  • C++ source in cpp_<name>/<name>.cpp for cpp binary extension modules.
micc add [OPTIONS] NAME

Options

--app

Add a CLI .

--group

Add a CLI with a group of sub-commands rather than a single command CLI.

--py

Add a Python module.

--package

Add a Python module with a package structure rather than a module structure:

  • module structure = <module_name>.py
  • package structure = <module_name>/__init__.py

Default = module structure.

--f90

Add a f90 binary extionsion module (Fortran).

--cpp

Add a cpp binary extionsion module (C++).

-T, --templates <templates>

Ordered list of Cookiecutter templates, or a single Cookiecutter template.

--overwrite

Overwrite pre-existing files (without backup).

--backup

Make backup files (.bak) before overwriting any pre-existing files.

Arguments

NAME

Required argument

convert-to-package

Convert a Python module project to a package.

A Python module project has only a <package_name>.py file, whereas a Python package project has <package_name>/__init__.py and can contain submodules, such as Python modules, packages and applications, as well as binary extension modules.

This command also expands the package-general-docs template in this project, which adds a AUTHORS.rst, HISTORY.rst and installation.rst to the documentation structure.

micc convert-to-package [OPTIONS]

Options

--overwrite

Overwrite pre-existing files (without backup).

--backup

Make backup files (.bak) before overwriting any pre-existing files.

create

Create a new project skeleton.

The project name is taken to be the last directory of the project_path. If this directory does not yet exist, it is created. If it does exist already, it must be empty.

The package name is the derived from the project name, taking the PEP8 module naming rules into account:

  • all lowercase.
  • dashes '-' and spaces ' ' replaced with underscores '_'.
  • in case the project name has a leading number, an underscore is prepended '_'.

If project_path is a subdirectory of a micc project, micc refuses to continu, unless --allow-nesting is soecified.

micc create [OPTIONS] [NAME]

Options

--publish

If specified, verifies that the package name is available on PyPI. If the result is False or inconclusive the project is NOT created.

-p, --package

Create a Python project with a package structure rather than a module structure:

  • package structure = <module_name>/__init__.py
  • module structure = <module_name>.py
--micc-file <micc_file>

The file containing the descriptions of the template parameters usedin the Cookiecutter templates.

--python <python>

minimal python version for your project.

-d, --description <description>

Short description of your project.

-l, --lic <lic>

License identifier.

-T, --template <template>

Ordered list of Cookiecutter templates, or a single Cookiecutter template.

-n, --allow-nesting

If specified allows to nest a project inside another project.

--module-name <module_name>

use this name for the module, rather than deriving it from the project name.

Arguments

NAME

Optional argument

info

Show project info.

  • file location
  • name
  • version number
  • structure (with -v)
  • contents (with -vv)

Use verbosity to produce more detailed info.

micc info [OPTIONS]

Options

--name

print the project name.

--version

print the project version.

mv

Rename or remove a component, i.e an app (CLI) or a submodule.

param cur_name:name of component to be removed or renamed.
param new_name:new name of the component. If empty, the component will be removed.
micc mv [OPTIONS] CUR_NAME [NEW_NAME]

Options

--silent

Do not ask for confirmation on deleting a component.

--entire-package

Replace all occurences of <cur_name> in the entire package and in the tests directory.

--entire-project

Replace all occurences of <cur_name> in the entire project.

Arguments

CUR_NAME

Required argument

NEW_NAME

Optional argument

tag

Create a git tag for the current version and push it to the remote repo.

micc tag [OPTIONS]

version

Modify or show the project’s version number.

micc version [OPTIONS]

Options

-M, --major

Increment the major version number component and set minor and patch components to 0.

-m, --minor

Increment the minor version number component and set minor and patch component to 0.

-p, --patch

Increment the patch version number component.

-r, --rule <rule>

Any semver 2.0 version string.

-t, --tag

Create a git tag for the new version, and push it to the remote repo.

-s, --short

Print the version on stdout.

-d, --dry-run

bumpversion –dry-run.

API

Package et-micc

Top-level package for et-micc.

Module et_micc.project

An OO interface to micc projects.

class et_micc.project.Project(options)[source]

An OO interface to micc projects.

Parameters:options (types.SimpleNameSpace) – all options from the micc CLI.
add_app(db_entry)[source]

Add a console script (app, aka CLI) to the package.

add_auto_build_code(db_entry)[source]

Add auto build code for binary extension modules in __init__.py of the package.

add_cmd()[source]

Add some source file to the project.

This method dispatches to

add_cpp_module(db_entry)[source]

Add a cpp module to this project.

add_dependencies(deps)[source]

Add dependencies to the pyproject.toml file.

Parameters:deps (dict) – (package,version_constraint) pairs.
add_f90_module(db_entry)[source]

Add a f90 module to this project.

add_python_module(db_entry)[source]

Add a python sub-module or sub-package to this project.

app_exists(app_name)[source]

Test if there is already an app with name app_name in this project.

  • <package_name>/cli_<app_name>.py
Parameters:app_name (str) – app name
Returns:bool
cpp_module_exists(module_name)[source]

Test if there is already a cpp module with name py:obj:module_name in this project.

Parameters:module_name (str) – module name
Returns:bool
create()[source]

Create a new project skeleton.

deserialize_db()[source]

Read file db.json into self.db.

error(msg)[source]

Print an error message msg and set the project’s exit_code.

f90_module_exists(module_name)[source]

Test if there is already a f90 module with name py:obj:module_name in this project.

Parameters:module_name (str) – module name
Returns:bool
info_cmd()[source]

Output info on the project.

module_exists(module_name)[source]

Test if there is already a module with name py:obj:module_name in this project.

This can be either a Python module, package, or a binary extension module.

Parameters:module_name (str) – module name
Returns:bool
module_to_package(module_py)[source]

Move file module.py to module/__init__.py.

Parameters:module_py (str|Path) – path to module.py
module_to_package_cmd()[source]

Convert a module project (module.py) to a package project (package/__init__.py).

mv_component()[source]

Rename or Remove a component (sub-module, sub-package, Fortran module, C++ module, app (CLI).

py_module_exists(module_name)[source]

Test if there is already a python module with name module_name in the project at project_path.

Parameters:module_name (str) – module name
Returns:bool
py_package_exists(module_name)[source]

Test if there is already a python package with name module_name in the project at project_path.

Parameters:module_name (str) – module name
Returns:bool
replace_in_file(filepath, cur_name, new_name, contents_only=False)[source]

Replace <cur_name> with <new_name> in the filename and its contents.

serialize_db(db_entry=None, verbose=False)[source]

Write self.db to file db.json.

Self.options is a SimpleNamespace object which is not default json serializable. This function takes care of that by converting to str where possible, and ignoring objects that do not need serialization, as e.g. self.options.logger.

tag_cmd()[source]

Create and push a version tag v<Major>.<minor>.<patch> for the current version.

version_cmd()[source]

Bump the version according to self.options.rule or show the current version if no rule is specified.

The version is stored in pyproject.toml in the project directory, and in __version__ variable of the top-level package, which is either in <package_name>.py, <package_name>/__init__.py, or in <package_name>/__version__.py.

warning(msg)[source]

Print an warning message msg and set the project’s exit_code.

Module et_micc.expand

Helper functions for dealing with Cookiecutter templates.

et_micc.expand.expand_templates(options)[source]

Expand a list of cookiecutter templates in directory project_path.

Expanding templates may require overwriting pre-existing files. Micc handles this situation in different ways:

  • If options.overwrite equals False the exansion will fail without overwriting any pre-existing files. The project is not modified. A warning is produced. This is the default. To continue, rerun the command with one of the two options below.
  • If options.overwrite equals True the exansion will overwrite any pre-existing files without backup, and produce a warning, listing the overwritten files.
  • If options.backup equals True pre-existing files will be backed up (.bak) before the new files are expanded. If anything went wrong, you can inspect the backup files, and correct the errors manually.
Parameters:options (types.SimpleNamespace) –

namespace object with options accepted by et_micc commands. Relevant attributes are

  • templates: ordered list of (paths to) cookiecutter templates that will be expanded as they appear. The template parameters are propagated from each template to the next.
  • verbosity
  • project_path: Path to the project on which the command operates.
  • template_parameters: extra template parameters not read from micc_file
et_micc.expand.get_preferences(micc_file)[source]

Get the preferences from micc_file.

(This function requires user interaction if no micc_file was provided!)

Parameters:micc_file (Path) – path to a json file.
et_micc.expand.get_template_parameters(preferences)[source]

Get the template parameters from the preferences.

Parameters:preferenes (dict|Path) –
Returns:dict of (parameter name,parameter value) pairs.
et_micc.expand.resolve_template(template)[source]

Compose the absolute path of a template.

et_micc.expand.set_preferences(micc_file)[source]

Set the preferences in micc_file.

(This function requires user interaction!)

Parameters:micc_file (Path) – path to a json file.

Module et_micc.logger

Helper functions for logging.

class et_micc.logger.IndentingLogger(name, level=0)[source]

Cuastom Logger class for creating indented logs.

This is the class for the et_micc logger.

dedent()[source]

Increase the indentation level.

Future log messages will shift to the left. The width of the shift is determined by the last call to indent()

indent(n=4)[source]

Increase the indentation level.

Future log messages will shift to the right by n spaces.

et_micc.logger.create_logger(path_to_log_file, filemode='a')[source]

Create a logger object for et_micc.

It will log to:

  • the console
  • file path_to_log_file. By default log message will be appended to the
et_micc.logger.log(logfun=None, before='doing', after='done.', bracket=True)[source]

Print a message before and after executing the body of the contextmanager.

Parameters:
  • logfun (callable) – a function that can print a log message, e.g. print(), info().
  • before (str) – print this before body is executed
  • after (str) – print this after body is executed
  • bracket (bool) – append ‘ [‘ to before and prepend ‘] ‘ to after.

This works best with the IndentingLogger.

et_micc.logger.logtime(project=None)[source]

Log start time, end time and duration of the task in the body of the context manager to the et_micc logger.

This logs on debug level. To see in in the console output you must pass -vv to et_micc.

Parameters:global_options (SimpleNameSpace) – pass verbosity to the et_micc logger.
et_micc.logger.verbosity_to_loglevel(verbosity)[source]

Tranlate verbosity into a loglevel.

Parameters:verbosity (int) –

Created on 18 Nov 2019

@author: etijskens

Module et_micc.utils

Utility functions for et_micc.

et_micc.utils.execute(cmds, logfun=None, stop_on_error=True, env=None, cwd=None)[source]

Executes a list of OS commands, and logs with logfun.

Parameters:cmds (list) – list of OS commands (=list of list of str) or a single command (list of str)
Parma callable logfun:
 a function to write output, typically logging.getLogger('et_micc').debug.
Returns int:return code of first failing command, or 0 if all commanbds succeed.
et_micc.utils.get_project_path(p)[source]

Look for a project directory in the parents of path p.

Parameters:p (Path) –
Returns:the nearest directory above p that is project directory.
Raise:RuntimeError if p is noe inside a project directory.
et_micc.utils.in_directory(path)[source]

Context manager for changing the current working directory while the body of the context manager executes.

et_micc.utils.insert_in_file(file, lines=[], before=False, startswith=None)[source]

Insert lines at a specific position in a <file>.

Parameters:
  • file (Path) – path to file in which to insert
  • lines (list) – list of lines to insert. If a line does not end with a newline, it is added.
  • before (bool) – insert before or after a reference line.
  • startswith (str) – find the reference line as the first line that starts with <startswith>. If no such line is found the text is inserted at the end.
et_micc.utils.intersect(version_range_1, version_range_2)[source]

Compute the intersection of two version ranges

et_micc.utils.is_project_directory(path, project=None)[source]

Verify that the directory path is a project directory.

Parameters:
  • path (Path) – path to a directory.
  • project (Project) –

    if not None these variables are set:

    • project.project_name
    • project.package_name
    • project.pyproject_toml
Returns:

bool.

As a sufficident condition, we request that

  • there is a pyproject.toml file, exposing the project’s name:py:obj:[‘tool’][‘poetry’][‘name’]
  • that there is a python package or module with that name, converted by pep8_module_name().
et_micc.utils.is_publishable(package_name, verbose=True)[source]

Is the name <package_name> available for publishing on PyPI?

This is achieved by running pip search <package_name> and examining the output. If <package_name> is in use, it will appear in the output.

Parameters:
  • package_name (str) – name of the package for which we want to verify the availability.
  • verbose (bool) – show the output of pip search <package_name> and the examination process.
Returns:

the answer as a bool, if the command pip search <package_name> was successful, and None otherwise (e.g. because of no connection).

et_micc.utils.operator_version(version_constraint_string)[source]

Split version_constraint_string in operator and version.

Returns:(str,semantic_version.Version)
et_micc.utils.pep8_module_name(name)[source]

Convert a module name to a PEP8 compliant module name.

  • lowercase
  • whitespace -> underscore
  • dash -> underscore
et_micc.utils.replace_in_file(file_to_search, look_for, replace_with)[source]

Replace the text look_for with replace_with in file file_to_search

et_micc.utils.validate_intersection(intersection)[source]

Test if the intersection is not empty. :returns: bool

et_micc.utils.verify_project_name(project_name)[source]

Project names must start with a char, and contain only chars, digits, underscores and dashes.

Returns:bool
et_micc.utils.verify_project_structure(path, project=None)[source]

Verify that there is either a Python module <package_name>.py, or a package <package_name>/__init__.py (and not both).

Returns:a list with what was found. This list should have length 1. If its length is 0, neither module.py, nor module/__init__.py were found. If its length is 2, both were found.
et_micc.utils.version_constraint(version_range)[source]

Convert a version_range to a version constraing string

et_micc.utils.version_range(version_constraint_string)[source]

Return interval [lower_bound,upper_bound[ as a tuple for a given version constraint.

Note that the lower bound is inclusive, but the upper bound is exclusive. If one of the bounds is None, then it is unbound in that direction.

Module et_micc.tomlfile

class et_micc.tomlfile.TomlFile(path)[source]

Read/write access to .toml files (pyproject.toml in particular).

Open a .toml file and read its content

The content is accessed by subscripting:

toml = TomlFile('path/to/toml')
# Read an item from the .toml file's content:
old_name = toml['tool']['poetry']['name']
# Modify an item in the .toml file's content (but not yet in the file):
toml['tool']['poetry']['name'] = 'new_name'
# Now modify the file with the modified content:
toml.save()
Parameters:path (str|Path) – path to the .toml file.
Raises:FileNotFoundError if the file does not exist.
exists()[source]

Does the .toml file exist?

path

Path object of the .toml file

save()[source]

Write the current content of the .toml file back to file.

Module et_micc.static_vars

A decorator for adding static variables to a function. see https://stackoverflow.com/questions/279561/what-is-the-python-equivalent-of-static-variables-inside-a-function

et_micc.static_vars.static_vars(**kwargs)[source]

Add static variables to a method.

To add the variable counter to foo() :

@static_vars(counter=0)
def foo():
    foo.counter += 1 # foo.counter is incremented on every call to foo

Development environment

Principles

This document aims at setting up a practical development environment for Python projects, allowing the integration of binary extension modules based on C++ or Fortran. Developing on a local machine, a desktop or a laptop, is often somewhat more practical than developing on the cluster. Typically, I start developing on my own machine until things are working well, and then I port the code to the cluster for further testing. I switch back and forth between both environments several times.

There are important differences in managing your environment on your local machine and on the cluster. They are described in detail in :ref:`Tutorial-cluster`_.

Warning

Micc was designed for supporting HPC developers, and, consequentially, with Linux systems in mind. We provide support for Linux (Ubuntu 19.10, CentOS 7.7), and macOS. Due to lack of human resources, it has not been tested on Windows, and no support is provided for it. However, WSL-2 may do the trick on Windows.

For Python development on your local machine, we highly recommend to set up your development environment as described in My Python Development Environment by Jacob Kaplan-Moss. We will assume that this is indeed the case for all tutorials here. In particular:

  • We use pyenv to manage different Python versions on our system (except for Anaconda or Miniconda Python distributions, where the Python version is naturally embedded in conda virtual environmnent).
  • Pipx is used to install Python applications system-wide. If your projects depend on different Python versions it is a good idea to pipx install Micc, which we use for project management and and building binary extension modules.
  • Poetry is used to set up virtual environments for the projects we are working, for managing their dependencies and for publishing them.
  • For building binary extension modules from C++ CMake must be available.
  • For Micc projects with binary extension the necessary compilers (C++, Fortran) must be installed on the system.
  • As an IDE for Python/Fortran/C++ development we recommend:
    • Eclipse IDE for Scientific Computing with the PyDev plugin. This is an old time favorite of mine, although The learning curve is a bit steep and documentation could be better. Today, PyDev is beginning to lag behind for Python, but Eclipse is still very good for Fortran and C++.
    • PyCharm Community Edition. I only tried this one recently and was very soon convinced for python development. (Didn’t go back to Eclipse once since then). I currently have insufficient experience for Fortran and C++ for making recommendations.

Setting up your local Development environment - step by step

  1. Install pyenv: See Managing Multiple Python Versions With pyenv for common install instructions on macos and Linux.

    If you’re on Windows, consider using the fork pyenv-win. (Pyenv does not work on windows outside the Windows Subsystem for Linux).

  2. Install your favourite Python versions. E.g.:

    > pyenv install 3.8.0
    
  3. Install poetry. The recommended way for this is:

    > curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python
    

    This approach will give you one single system-wide Poetry installation, which will automatically pick up the current Python version in your environment. Note, that as of Poetry 1.0.0, Poetry will also detect conda virtual environments.

  4. Configure your poetry installation:

        > poetry config virtualenvs.in-project true
    
    This ensures that running ``poetry install`` in a project directory will create a
    project's virtual environment in its own root directory, rather than somewhere in
    the Poetry_ configuration directories, where it is less accessible. If you have
    several Poetry_ installations, they all use the same configuration.
    
  5. Install pipx

    > python -m pip install --user pipx
    > python -m pipx ensurepath
    

    Note

    This will use the Python version returned by pyenv version. Micc is certainly comfortable with Python 3.7 and 3.8.

  6. Install micc with pipx:

    > pipx install et-micc
      installed package et-micc 0.10.8, Python 3.8.0
      These apps are now globally available
        - micc
    done!
    

    Note

    micc will be run under the Python version with which pipx was installed.

    To upgrade micc to the newest version run:

    > pipx upgrade et-micc
    
  7. To upgrade to a newer version of a tool that you installed with pipx, use the upgrade command:

    > pipx upgrade et-micc
    et-micc is already at latest version 0.10.8 (location: /Users/etijskens/.local/pipx/venvs/et-micc)
    
  8. If you want to develop binary extensions in Fortran or C++, you will need a Fortran compiler or a C++ compiler, respectively. For C++ binary extensions, also CMake and make must be on your system PATH. You can download CMake directly from cmake.org.

    If you are on one of the VSC clusters, check “Tutorial 7 - Using micc projects on the VSC clusters”.

  9. Install an IDE. For many years I have been using Eclipse IDE for Scientific Computing with the PyDev plugin, but recently I became addicted to PyCharm Community Edition. Both are available for MacOS, Linux and Windows.

  10. Get a git account at github, install git if is is not pre-installed on your system, and configure it:

    > git config --global user.email "you@example.com"
    > git config --global user.name "Your Name"
    
  11. Create your first micc project. The very first time, y ou will be asked to set some default values that identify you as a micc user. Replace the preset values by your own preferences:

    > micc -p my-first-micc-project create
    your full name [Engelbert Tijskens]: carl morck
    your e-mail address [engelbert.tijskens@uantwerpen.be]: carl.mork@q-series.dk
    your github username (leave empty if you do not have) [etijskens]: cmorck
    the initial version number of a new project [0.0.0]:
    default git branch [master]:
    

    The last two entries are generally ok. If you later want to change the entries, you can simply edit the file ~/.et_micc/micc.json.

You should be good to go now.

Setting up your cluster Development environment - step by step

For details see Tutorial-cluster

  1. On the cluster you must select the software packages you want to use manually by loading modules with the module system The module system provides access to the many pre-installed software packages - including Python versions - that are especially built for HPC purposes and optimal performance. They are generally, much more performant than if you would have built them yourself. It is, therefor, discouraged to install pipx to your own Python versions.

  2. Install poetry. The recommended way for this is:

    > curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | /usr/bin/python
    

    (Make sure to use the system Python /usr/bin/python for this. Otherwise you will run into trouble selecting a Python version for your project.) This approach will give you one single system-wide Poetry installation, which will automatically pick up the current Python version in your environment.

  3. Configure your poetry installation:

        > poetry config virtualenvs.in-project true
    
    This ensures that running ``poetry install`` in a project directory will create a
    project's virtual environment in its own root directory, rather than somewhere in
    the Poetry_ configuration directories, where it is less accessible.
    
  4. For micc projects that are cloned from a git repository, we recommend install micc as a development dependency of your project:

    > cd path/to/myproject
    > poetry add --dev
    

    If you want to create a new project with micc, you must install it first of course:

    > module load Python         # load your favourite Python module
    > pip install --user et-micc
    

    Without the --user flag pip_ would try to install in the cluster module, where you to not have access. The flag instructs pip_ to install in your home directory.

  5. If you want to develop binary extensions in Fortran or C++, you will need a Fortran compiler and/or a C++ compiler, respectively. In general, loading a Python module on the cluster, automatically also makes the compilers available that were used to compile the Python version.

    For C++ binary extensions, also CMake must be on your system PATH:

    > module load CMake
    
  6. If you need a full IDE, you must use one of the graphical environments on the cluster (see https://vlaams-supercomputing-centrum-vscdocumentation.readthedocs-hosted.com/en/latest/access/access_and_data_transfer.html#gui-applications-on-the-clusters_) Unfortunately, there are different gui environments for the different VSC clusters. If you only want a graphical editor, you can use Eclipse Remote system explorer as a remote editor.

  7. Get a git account at github, install git if is is not pre-installed on your system, and configure it:

    > module load git                                   # for a more recent git version
    > git config --global user.email "you@example.com"
    > git config --global user.name "Your Name"
    
  8. Create your first micc project. The very first time, y ou will be asked to set some default values that identify you as a micc user. Replace the preset values by your own preferences:

    > micc -p my-first-micc-project create
    your full name [Engelbert Tijskens]: carl morck
    your e-mail address [engelbert.tijskens@uantwerpen.be]: carl.mork@q-series.dk
    your github username (leave empty if you do not have) [etijskens]: cmorck
    the initial version number of a new project [0.0.0]:
    default git branch [master]:
    

    The last two entries are generally ok. If you later want to change the entries, you can simply edit the file ~/.et_micc/micc.json.

You should be good to go now.

Productivity tip

Create a bash script to set the environment for your project consistently over time, e.g.:

#!/usr/bin/bash
module load git
module load CMake
# load my favourite python:
module load Python
cd path/to/myproject
# activate myproject's virtual environment:
source .venv/bin/activate

Frequently asked questions

Can I use Anaconda Python distributions?

Yes, both Micc and Poetry play well together with conda environments, as long as you installed the Anaconda Python distributions in your user space. So, on your local machine there is no problem. On a cluster installing your own conda distribution is not recommended because it uses a lot of disk space. If you cannot avoid it, install on the local scratch file system.

Using the Intel Distribution for Python, which is also a conda based, did not work out at the time of writing (September 2020). Working around Poetry by manually creating conda virtual environments and managing dependencies manually failed.

Can I use micc on the (VSC) clusters?

Yes, see `Tutorial 7 - Using micc projects on the VSC clusters`_.

Why doesn’t micc have rename and remove commands?

While renaming a submodule or a removing a submodule would be valuable additions, it is very complicated to retrieve all references to a submodule in a project correctly. See https://github.com/etijskens/et-micc/issues/29 and https://github.com/etijskens/et-micc/issues/32.

If refactoring is necessary, we strongly recommend creating a new project and subprojects with the correct names and manually copying stuff from the original to the refactored project.

Tutorials

Tutorial 1: Getting started with micc

Note

All tutorial sections start with the bare essentials, which should get you up and running. They are often followed by more detailed subsections that provide useful background information that is needed for intermediate or advanced usage. These sections have an explicit [intermediate] or [advanced] tag in the title, e.g. 1.1.1. Modules and packages [intermediate] and they are indented. Background sections can be skipped on first reading, but the user is encouraged to read them at some point. The tutorials are rather extensive as they interlaced with many good practices advises.

Micc wants to provide a practical interface to the many aspects of managing a Python project: setting up a new project in a standardized way, adding documentation, version control, publishing the code to PyPI, building binary extension modules in C++ or Fortran, dependency management, … For all these aspect there are tools available, yet i found myself struggling get everything right and looking up the details each time. Micc is an attempt to wrap all the details by providing the user with a standardized yet flexible workflow for managing a Python project. Standardizing is a great way to increase productivity. For many aspects the tools used by Micc are completely hidden from the user, e.g. project setup, adding components, building binary extensions, … For other aspects Micc provides just the necessary setup for you to use other tools as you need them. Learning to use the following tools is certainly beneficial:

  • Poetry: for dependency management, virtual environment creation, and publishing the project to PyPI (and a lot more, if you like). Although extremely handy on a desktop machine or a laptop, it does not play well with the module system that is used on the VSC clusters for accessing applications. A workaround is provided in Tutorial 6.
  • Git: for version control. Its use is optional but highly recommended. See Tutorial 4 for some git coverage.
  • Pytest: for (unit) testing. Also optional and also highly recommended.
  • Sphinx: for building documentation. Optional but recommended.

The basic commands for theese tools are covered in these tutorials.

1.1 Creating a project

Creating a new project is simple:

> micc create path/to/my_first_project

This creates a new project my_first_project in folder path/to. Note that the directory path/to/my_first_project must either not exist, or be empty.

Typically, the new project is created in the current working directory:

> cd path/to
> micc create my_first_project
[INFO]           [ Creating project (my_first_project):
[INFO]               Python module (my_first_project): structure = (my_first_project/my_first_project.py)
...
[INFO]           ] done.

After creating the project, we cd into the project directory because any further micc commands will then automatically act on the project in the current working directory:

> cd my_first_project

To apply a micc command to a project that is not in the current working directory see 1.2.1. The project path in micc.

The above command creates a project for a simple Python module, that is, the project directory will contain - among others - a file my_first_project.py in which represents the Python module:

my_first_project          # the project directory
└── my_first_project.py   # the Python module, this is where your code goes

When some client code imports this module:

import my_first_module

the code in my_first_module.py is executed.

Note that the name of the Python module name is (automatically) taken from the project name that with gave in the micc create command. If you want project and module names to differ from each other, check out the 1.1.2 What’s in a name [intermediate] section.

The module project type above is suited for problems that can be solved with a single Python file (my_first_project.py in the above case). For more complex problems a package structure is more appropriate. To learn more about the use of Python modules vs packages, check out the 1.1.1. Modules and packages [intermediate] section below.

1.1.1. Modules and packages [intermediate]

A Python module is the simplest Python project we can create. It is meant for rather small projects that conveniently fit in a single (Python) file. More complex projects require a package structure. They are created by adding the --package flag on the command line:

> micc create my_first_project --package
[INFO]           [ Creating project (my_first_project):
[INFO]               Python package (my_first_project): structure = (my_first_project/my_first_project/__init__.py)
[INFO]               [ Creating git repository
                       ...
[INFO]               ] done.
[WARNING]            Run 'poetry install' in the project directory to create a virtual environment and install its dependencies.
[INFO]           ] done.

The output shows a different file structure of the project than for a module. Instead of the file my_first_project.py there is a directory my_first_project, containing a __init__.py file. So, the structure of a package project looks like this:

my_first_project          # the project directory
└── my_first_project      # the package directory
    └── __init__.py       # the file where your code goes

Typically, the package directory will contain several other Python files that together make up your Python package. When some client code imports a module with a package structure,

import my_first_module

it is the code in my_first_module/__init__.py that is executed. The my_first_module/__init__.py file is the equivalent of the my_first_module.py in a module structure.

The distinction between a module structure and a package structure is also important when you publish the module. When installing a Python package with a module structure, only the ‘’my_first_project.py’’ will be installed, while with the package structure the entire my_first_project directory will be installed.

If you created a projected with a module structure and discover over time that its complexity has grown beyond the limits of a simple module, you can easily convert it to a package structure project at any time. First cd into the project directory and run:

> cd my_first_project
> micc convert-to-package
[INFO]           Converting Python module project my_first_project to Python package project.
[WARNING]        Pre-existing files that would be overwritten:
[WARNING]          /Users/etijskens/software/dev/workspace/p1/docs/index.rst
Aborting because 'overwrite==False'.
  Rerun the command with the '--backup' flag to first backup these files (*.bak).
  Rerun the command with the '--overwrite' flag to overwrite these files without backup.

Because we do not want to replace existing files inadvertently, this command will always fail, unless you add either the --backup flag, in which case micc makes a backup of all files it wants to replace, or the --overwrite flag, in which case those files will be overwritten. Micc will always produce a list of files it wants to replace. Unless you deliberately modified one of the files in the list, you can safely use --overwrite. If you did, use the --backup flag and manually copy the the changes from the .bak file to the new file.

> micc convert-to-package --overwrite
Converting simple Python project my_first_project to general Python project.
[WARNING]        '--overwrite' specified: pre-existing files will be overwritten WITHOUT backup:
[WARNING]        overwriting /Users/etijskens/software/dev/workspace/ET-dot/docs/index.rst

If you want micc to create a project with a package structure, rather than the default module structure you must append the --package flag (or -p) to to the micc create command:

> micc create my_first_project --package

[INFO]           [ Creating project (my_first_project):
[INFO]               Python package (my_first_project): structure = (my_first_project/my_first_project/__init__.py)
...
[INFO]           ] done.

The output of the command clearly shows the package structure.

1.1.2 What’s in a name [intermediate]

The name you choose for your project has many consequences. Ideally, a project name is

  • descriptive
  • unique
  • short

Although one might think of even more requirements, such as being easy to type, satisfying these three is already hard enough. E.g. my_nifty_module may possibly be unique, but it is neither descriptive, neither short. On the other hand, dot_product is descriptive, reasonably short, but probably not unique. Even my_dot_product is probably not unique, and, in addition, confusing to any user that might want to adopt your my_dot_product. A unique name - or at least a name that has not been taken before - becomes really important when you want to publish your code for others to use it. The standard place to publish Python code is the Python Package Index, where you find hundreds of thousands of projects, many of which are really interesting and of high quality. Even if there are only a few colleagues that you want to share your code with, you make their life (as well as yours) easier when you publish your my_nifty_module at PyPI. To install your my_nifty_module they will only need to type:

> pip install my_nifty_module

(The name my_nifty_module is not used so far, but nevertheless we recommend to choose a better name). Micc will help you publishing your work at PyPI with as little effort as possible, provided your name has not been used sofar. Note that the micc create command has a --publish flag that checks if the name you want to use for your project is still available on PyPI, and, if not, refuses to create the project and asks you to use another name for your project.

As there are indeed hundreds of thousands of Python packages published on PyPI, finding a good name has become quite hard. Personally, I often use a simple and short descriptive name, prefixed by my initials, et-, which generally makes the name unique. It has the advantage that all my published modules are grouped in the PyPI listing.

Another point of attention is that although in principle project names can be anything supported by your OS file system, as they are just the name of a directory, micc insists that module and package names comply with the PEP8 module naming rules. Micc derives the package (or module) name from the project name as follows:

  • capitals are replaced by lower-case
  • hyphens``’-‘`` are replaced by underscores '_'

If the resulting module name is not PEP8 compliant, you get an informative error message:

> micc create 1proj
[ERROR]
The project name (1proj) does not yield a PEP8 compliant module name:"
  The project name must start with char, and contain only chars, digits, hyphens and underscores."
  Alternatively, provide an explicit module name with the --module-name=<name>"

The last line indicates that you can specify an explicit module name, unrelated to the project name. In that case PEP8 compliance is not checked. The responsability then is all yours.

1.2 First steps in micc

1.2.1. The project path in micc

All micc commands accept the global --project-path=<path> parameter. Global parameters appear before the subcommand name. E.g. the command:

> micc --project-path path/to/my_first_project info
Project my_first_project located at path/to/my_first_project.
  package: my_first_project
  version: 0.0.0
  structure: my_first_project.py (Python module)

prints some info on the project at path/to/my_first_project. This can conveniently be abbreviated as:

> micc -p path/to/my_first_project info

Even the create command accepts the global --project-path=<path> parameter:

> micc -p path/to/my_second_project create

will create project my_second_project in the specified location. The command is identical to:

> micc create path/to/my_second_project

The default value for the project path is the current working directory, so:

> micc info

will print info about the project in the current working directory.

Hence, while working on a project, it is convenient to cd into the project directory and execute your micc commands from there, without the the global --project-path=<path> parameter.

This approach works even with the micc create command. If you create an empty directory and cd into it, you can just run micc create and it will create the project in the current working directory, taking the project name from the name of the current working directory.

1.2.2 Virtual environments

Virtual environments enable you to quickly set up a Python environment that isolated from the installed Python on your system. In this way you can easily cope with different dependencies between your Python projects.

For a detailed introduction to virtual environments see Python Virtual Environments: A Primer.

When you are developing or using several Python projects it can become difficult for a single Python environment to satisfy all the dependency requirements of these projects simultaneously. Dependencies conflict can easily arise. Python promotes and facilitates code reuse and as a consequence Python tools typically depend on tens to hundreds of other modules. If toolA and toolB both need moduleC, but each requires a different version of it, there is a conflict because it is impossible to install two versions of the same module in a Python environment. The solution that the Python community has come up with for this problem is the construction of virtual environments, which isolates the dependencies of a single project to a single environment.

Creating virtual environments

Since Python 3.3 Python comes with a venv module for the creation of virtual environments:

> python -m venv my_virtual_environment

This creates a directory my_virtual_environment in the current working directory which is a complete isolated Python environment. The Python version in this virtual environment is the same as that of the python command with which the virtual environment was created. To use this virtual environment you must activate it:

> source my_virtual_environment/bin/activate
(my_virtual_environment) >

Activating a virtual environment modifies the command prompt to remind you constantly that you are working in a virtual environment. The virtual environment is based on the current Python - by preference set by pyenv. If you install new packages, they will be installed in the virtual environment only. The virtual environment can be deactivated by running

(my_virtual_environment) > deactivate
>
Creating virtual environments with Poetry

Poetry uses the above mechanism to manage virtual environment on a per project basis, and can install all the dependencies of that project, as specified in the pyproject.toml file, using the install command. Since our project does not have a virtual environment yet, Poetry creates one, named .venv, and installs all dependencies in it. We first choose the Python version to use for the project:

> pyenv local 3.7.5
> python --version
Python 3.7.5
> which python
/Users/etijskens/.pyenv/shims/python

Next, create use poetry to create the virtual environment and install all its dependencies specified in pyproject.toml:

> poetry install
Creating virtualenv et-dot in /Users/etijskens/software/dev/my_first_project/.venv
Updating dependencies
Resolving dependencies... (0.8s)

Writing lock file


Package operations: 10 installs, 0 updates, 0 removals

  - Installing pyparsing (2.4.5)
  - Installing six (1.13.0)
  - Installing atomicwrites (1.3.0)
  - Installing attrs (19.3.0)
  - Installing more-itertools (7.2.0)
  - Installing packaging (19.2)
  - Installing pluggy (0.13.1)
  - Installing py (1.8.0)
  - Installing wcwidth (0.1.7)
  - Installing pytest (4.6.6)
  - Installing ET-dot (0.0.0)

The installed packages are all dependencies of pytest which we require for testing our code. The last package is ET-dot itself, which is installed in so-called development mode. This means that any changes in the source code are immediately visible in the virtual environment. Adding/removing dependencies is easily achieved by running poetry add some_module and poetry remove some_other_module. Consult the poetry_documentation for details

If the virtual environment already exists, or if some virtual environment is activated (not necessarily that of the project itself - be warned), that virtual environment is reused and all installations pertain to that virtual environment.

To use the just created virtual environment of our project, we must activate it:

> source .venv/bin/activate
(.venv)> python --version
Python 3.7.5
(.venv) > which python
/Users/etijskens/software/dev/ET-dot/.venv/bin/python

The location of the virtual environment’s Python and its version are as expected.

Note

Whenever you see a command prompt like (.venv) > the local virtual environment of the project has been activated. If you want to try yourself, you must activate it too.

To deactivate a script just run deactivate:

(.venv) > deactivate
> which python
/Users/etijskens/.pyenv/shims/python

The (.venv) notice disappears, and the active python is no longer that in the virtual environment, but the Python specified by pyenv

If something is wrong with a virtual environment, you can simply delete it:

> rm -rf .venv

and create it again. Sometimes it is necessary to delete the poetry.lock as well:

> rm poetry.lock
1.2.3 Modules and scripts

Micc always creates fully functional examples, complete with test code and documentation, so that you can inspect the files and see as much as possible how things are supposed to work. The my_first_project/my_first_project.py module contains a simple hello world method, called hello:

# -*- coding: utf-8 -*-
"""
Package my_first_project
========================

A 'hello world' example.
"""
__version__ = "0.0.0"


def hello(who='world'):
    """'Hello world' method."""
    result = "Hello " + who
    return result

The module can be used right away. Open an interactive Python session and enter the following commands:

> cd path/to/my_first_project
> source .venv/bin/activate
(.venv) > python
Python 3.8.0 (default, Nov 25 2019, 20:09:24)
[Clang 11.0.0 (clang-1100.0.33.12)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import my_first_project
>>> my_first_project.hello()
'Hello world'
>>> my_first_project.hello("student")
'Hello student'
>>>

Productivity tip

Using an interactive python session to verify that a module does indeed what you expect is a bit cumbersome. A quicker way is to modify the module so that it can also behave as a script. Add the following lines to my_first_project/my_first_project.py at the end of the file:

if __name__=="__main__":
   print(hello())
   print(hello("student"))

and execute it on the command line:

(.venv) > python my_first_project.py
Hello world
Hello student

The body of the if statement is only executed if the file is executed as a script. When the file is imported, it is ignored.

While working on a single-file project it is sometimes handy to put your tests the body of if __name__=="__main__":, as below:

if __name__=="__main__":
   assert hello() == "Hello world"
   assert hello("student") == "Hello student"
   print("-*# success #*-")

The last line makes sure that you get a message that all tests went well if they did, otherwise an AssertionError will be raised. When you now execute the script, you should see:

(.venv) > python my_first_project.py
-*# success #*-

When you develop your code in an IDE like eclipse+pydev or PyCharm, you can even execute the file without having to leave your editor and switch to a terminal. You can quickly code, test and debug in a single window.

While this is a very productive way of developing, it is a bit on the quick and dirty side. If the module code and the tests become more involved, however,the file will soon become cluttered with test code and a more scalable way to organise your tests is needed. Micc has already taken care of this.

1.2.4 Testing your code

Test driven development is a software development process that relies on the repetition of a very short development cycle: requirements are turned into very specific test cases, then the code is improved so that the tests pass. This is opposed to software development that allows code to be added that is not proven to meet requirements. The advantage of this is clear: the shorter the cycle, the smaller the code that is to be searched for bugs. This allows you to produce correct code faster, and in case you are a beginner, also speeds your learning of Python. Please check Ned Batchelder’s very good introduction to testing with pytest.

When micc creates a new project, or when you add components to an existing project, it immediately adds a test script for each component in the tests directory. The test script for the my_first_project module is in file ET-dot/tests/test_my_first_project.py. Let’s take a look at the relevant section:

# -*- coding: utf-8 -*-
"""Tests for my_first_project package."""

import my_first_project

def test_hello_noargs():
    """Test for my_first_project.hello()."""
    s = my_first_project.hello()
    assert s=="Hello world"

def test_hello_me():
    """Test for my_first_project.hello('me')."""
    s = my_first_project.hello('me')
    assert s=="Hello me"

Tests like this are very useful to ensure that during development the changes to your code do not break things. There are many Python tools for unit testing and test driven development. Here, we use Pytest:

> pytest
=============================== test session starts ===============================
platform darwin -- Python 3.7.4, pytest-4.6.5, py-1.8.0, pluggy-0.13.0
rootdir: /Users/etijskens/software/dev/workspace/foo
collected 2 items

tests/test_foo.py ..                                                        [100%]

============================ 2 passed in 0.05 seconds =============================

The output shows some info about the environment in which we are running the tests, the current working directory (c.q. the project directory, and the number of tests it collected (2). Pytest looks for test methods in all test_*.py or *_test.py files in the current directory and accepts test prefixed methods outside classes and test prefixed methods inside Test prefixed classes as test methods to be executed.

Note

Sometimes pytest discovers unintended test files or functions in other directories than the tests directory, leading to puzzling errors. It is therefore safe to instruct pytest to look only in the tests directory:

> pytest tests
...

If a test would fail you get a detailed report to help you find the cause of the error and fix it.

Debugging test code

When the report provided by pytest does not yield a clue on the cause of the failing test, you must use debugging and execute the failing test step by step to find out what is going wrong where. From the viewpoint of pytest, the files in the tests directory are modules. Pytest imports them and collects the test methods, and executes them. Micc also makes every test module executable using the technique described in 1.2.3 Modules and scripts. At the end of every test file you will find some extra code:

if __name__ == "__main__":
    the_test_you_want_to_debug = test_hello_noargs

    print("__main__ running", the_test_you_want_to_debug)
    the_test_you_want_to_debug()
    print('-*# finished #*-')

On the first line of the if __name__ == "__main__": body, the variable the_test_you_want_to_debug is set to the name of some test method in our test file test_et_dot.py, here test_hello_noargs, which refers to the hello world that was in the et_dot.py file originally. The variable the_test_you_want_to_debug is now just another variable pointing to the very same function object as test_hello_noargs and behaves exactly the same (see Functions are first class objects). The next statement prints a start message that tells you that __main__ is running that test method, after which the test method is called through the the_test_you_want_to_debug variable, and finally another message is printed to let you know that the script finished. Here is the output you get when running this test file as a script:

(.venv) > python tests/test_et_dot.py
__main__ running <function test_hello_noargs at 0x1037337a0>
-*# finished #*-

The execution of the test does not produce any output. Now you can use your favourite Python debugger to execute this script and step into the test_hello_noargs test method and from there into et_dot.hello to examine if everything goes as expected. Thus, to debug a failing test, you assign its name to the the_test_you_want_to_debug variable and debug the script.

1.2.5 Generating documentation [intermediate]

Documentation is extracted from the source code using Sphinx. It is almost completely generated automatically from the doc-strings in your code. Doc-strings are the text between triple double quote pairs in the examples above, e.g. """This is a doc-string.""". Important doc-strings are:

  • module doc-strings: at the beginning of the module. Provides an overview of what the module is for.
  • class doc-strings: right after the class statement: explains what the class is for. (Usually, the doc-string of the __init__ method is put here as well, as dunder methods (starting and ending with a double underscore) are not automatically considered by sphinx.
  • method doc-strings: right after a def statement.

According to pep-0287 the recommended format for Python doc-strings is restructuredText. E.g. a typical method doc-string looks like this:

def hello_world(who='world'):
    """Short (one line) description of the hello_world method.

    A detailed and longer description of the hello_world method.
    blablabla...

    :param str who: an explanation of the who parameter. You should
        mention its default value.
    :returns: a description of what hello_world returns (if relevant).
    :raises: which exceptions are raised under what conditions.
    """

Here, you can find some more examples.

Thus, if you take good care writing doc-strings, helpfule documentation follows automatically.

Micc sets up al the necessary components for documentation generation in sub-directory et-dot/docs/. There, you find a Makefile that provides a simple interface to Sphinx. Here is the workflow that is necessary to build the documentation:

> cd path/to/et-dot
> source .venv/bin/activate
(.venv) > cd docs
(.venv) > make html

The last line produces documentation in html format.

Let’s explain the steps

  1. cd into the project directory:

    > cd path/to/et-dot
    >
    
  2. Activate the project’s virtual environment:

    > source .venv/bin/activate
    (.venv) >
    
  3. cd into the docs subdirectory:

    (.venv) > cd docs
    (.venv) >
    

    Here, you will find the Makefile that does the work:

    (.venv) > ls -l
    total 80
    -rw-r--r--  1 etijskens  staff  1871 Dec 10 11:24 Makefile
    ...
    

To see a list of possible documentation formats, just run make without arguments:

(.venv) > make
Sphinx v2.2.2
Please use `make target' where target is one of
  html        to make standalone HTML files
  dirhtml     to make HTML files named index.html in directories
  singlehtml  to make a single large HTML file
  pickle      to make pickle files
  json        to make JSON files
  htmlhelp    to make HTML files and an HTML help project
  qthelp      to make HTML files and a qthelp project
  devhelp     to make HTML files and a Devhelp project
  epub        to make an epub
  latex       to make LaTeX files, you can set PAPER=a4 or PAPER=letter
  latexpdf    to make LaTeX and PDF files (default pdflatex)
  latexpdfja  to make LaTeX files and run them through platex/dvipdfmx
  text        to make text files
  man         to make manual pages
  texinfo     to make Texinfo files
  info        to make Texinfo files and run them through makeinfo
  gettext     to make PO message catalogs
  changes     to make an overview of all changed/added/deprecated items
  xml         to make Docutils-native XML files
  pseudoxml   to make pseudoxml-XML files for display purposes
  linkcheck   to check all external links for integrity
  doctest     to run all doctests embedded in the documentation (if enabled)
  coverage    to run coverage check of the documentation (if enabled)
(.venv) >
  1. To build documentation in html format, enter:

    (.venv) > make html
    ...
    (.venv) >
    

    This will generation documentation in et-dot/docs/_build/html. Note that it is essential that this command executes in the project’s virtual environment. You can view the documentation in your favorite browser:

    (.venv) > open _build/html/index.html       # on macosx
    

    or:

    (.venv) > xdg-open _build/html/index.html   # on ubuntu
    

    (On the cluster the command will fail because it does not have a graphical environment and it cannot run a html-browser.)

    Here is a screenshot:

    _images/im1-1.png

    If your expand the API tab on the left, you get to see the my_first_project module documentation, as it generated from the doc-strings:

    _images/im1-2.png
  2. To build documentation in .pdf format, enter:

    (.venv) > make latexpdf
    

    This will generation documentation in :file:et-dot/docs/_build/latex/et-dot.pdf`. Note that it is essential that this command executes in the project’s virtual environment. You can view it in your favorite pdf viewer:

    (.venv) > open _build/latex/et-dot.pdf      # on macosx
    

or:

(.venv) > xdg-open _build/latex/et-dot.pdf      # on ubuntu

Note

When building documentation by running the docs/Makefile, it is verified that the correct virtual environment is activated, and that the needed Python modules are installed in that environment. If not, they are first installed using pip install. These components are not becoming dependencies of the project. If needed you can add dependencies using the poetry add command.

The boilerplate code for documentation generation is in the docs directory, just as if it were generated by hand using sphinx-quickstart. (In fact, it was generated using sphinx-quickstart, but then turned into a Cookiecutter template.) those files is not recommended, and only rarely needed. Then there are a number of .rst files with capitalized names in the project directory:

  • README.rst is assumed to contain an overview of the project,
  • API.rst describes the classes and methods of the project in detail,
  • APPS.rst describes command line interfaces or apps added to your project.
  • AUTHORS.rst list the contributors to the project
  • HISTORY.rst which should describe the changes that were made to the code.

The .rst extenstion stands for reStructuredText. It iss a simple and concise approach to text formatting.

If you add components to your project through micc, care is taken that the .rst files in the project directory and the docs directory are modified as necessary, so that sphinx is able find the doc-strings. Even for command line interfaces (CLI, or console scripts) based on click the documentation is generated neatly from the help strings of options and the doc-strings of the commands.

1.2.6 Version control [advanced]

Although version control is extremely important for any software project with a lifetime of more a day, we mark it as an advanced topic as it does not affect the development itself. Micc facilitates version control by automatically creating a local git repository in your project directory. If you do not want to use it, you may ignore it or even delete it.

Git is a version control system that solves many practical problems related to the process software development, independent of whether your are the only developer, or there is an entire team working on it from different places in the world. You find more information about how micc uses git in Tutorial 4.

Let’s take a close look at the output of the micc create my_first_project command. The first line tells us that a project directory is being created:

[INFO]           [ Creating project (my_first_project):

The next line explains the structure of the project, module or package:

[INFO]               Python module (my_first_project): structure = (my_first_project/my_first_project.py)

Next we are informed that a local git repository is being created:

[INFO]               [ Creating git repository

Micc tries to push this local repository to a remote repository at https://github.com/yourgitaccount. If you did not create a remote git repository on beforehand, this gives rise to some warnings:

[WARNING]                    > git push -u origin master
[WARNING]                    (stderr)
                             remote: Repository not found.
                             fatal: repository 'https://github.com/yourgitaccount/my_first_project/' not found

Micc is unable to push the local repo to github, if the remote repo does not exist. The local repo is for many purposes sufficient, but the remote repo enables sharing your work with others and provides a backup of your work.

Finally, micc informs us that the tasks are finished.

[INFO] ] done. [INFO] ] done. >

Note that the name of the remote git repo is the project name, not the module name.

1.3 Miscellaneous

1.3.1 The license file [intermediate]

The project directory contains a LICENCE file, a text file describing the licence applicable to your project. You can choose between

  • MIT license (default),
  • BSD license,
  • ISC license,
  • Apache Software License 2.0,
  • GNU General Public License v3 and
  • Not open source.

MIT license is a very liberal license and the default option. If you’re unsure which license to choose, you can use resources such as GitHub’s Choose a License

You can select the license file when you create the project:

> cd some_empty_dir
> micc create --license BSD

Of course, the project depends in no way on the license file, so it can be replaced manually at any time by the license you desire.

1.3.2 The Pyproject.toml file [intermediate]

The file pyproject.toml (located in the project directory) is the modern way to describe the build system requirements of the project: PEP 518. Although most of this file’s content is generated automatically by micc and poetry some understanding of it is useful, consult https://poetry.eustace.io/docs/pyproject/.

The pyproject.toml file is rather human-readable:

> cat pyproject.toml
[tool.poetry]
name = "ET-dot"
version = "1.0.0"
description = "<Enter a one-sentence description of this project here.>"
authors = ["Engelbert Tijskens <engelbert.tijskens@uantwerpen.be>"]
license = "MIT"

readme = 'README.rst'

repository = "https://github.com/etijskens/ET-dot"
homepage = "https://github.com/etijskens/ET-dot"

keywords = ['packaging', 'poetry']

[tool.poetry.dependencies]
python = "^3.7"
et-micc-build = "^0.10.10"

[tool.poetry.dev-dependencies]
pytest = "^4.4.2"

[tool.poetry.scripts]

[build-system]
requires = ["poetry>=0.12"]
build-backend = "poetry.masonry.api"
1.3.3 The log file Micc.log [intermediate]

The project directory also contains a log file micc.log. All micc commands that modify the state of the project leave a trace in this file, So you can look up what happened when to your project. Should you think that the log file has become too big, or just useless, you can delete it manually, or add the --clear-log flag before any micc subcommand, to remove it. If the subcommand alters the state of the project, the log file will only contain the log messages from the last subcommand.

> ll micc.log
-rw-r--r--  1 etijskens  staff  34 Oct 10 20:37 micc.log

> micc --clear-log info
Project bar located at /Users/etijskens/software/dev/workspace/bar
  package: bar
  version: 0.0.0
  structure: bar.py (Python module)

> ll micc.log
ls: micc.log: No such file or directory
1.3.4 Adjusting micc to your needs [advanced]

Micc is based on a series of additive Cookiecutter templates which generate the boilerplate code. If you like, you can tweak these templates in the site-packages/et_micc/templates directory of your micc installation. When you pipx installed micc, that is typically something like:

~/.local/pipx/venvs/et-micc/lib/pythonX.Y/site-packages/et_micc,

where :file`pythonX.Y` is the python version you installed micc with.

1.4 A first real project

Let’s start with a simple problem: a Python module that computes the scalar product of two arrays, generally referred to as the dot product. Admittedly, this not a very rewarding goal, as there are already many Python packages, e.g. Numpy, that solve this problem in an elegant and efficient way. However, because the dot product is such a simple concept in linear algebra, it allows us to illustrate the usefulness of Python as a language for High Performance Computing, as well as the capabilities of Micc.

First, set up a new project for this dot project, which i named ET-dot, ET being my initials. Not knowing beforehand how involved this project will become, we create a simple module project:

> micc -p ET-dot create
[INFO]           [ Creating project (ET-dot):
[INFO]               Python module (my_first_project): structure = (ET-dot/et_dot.py
[INFO]               [ Creating git repository
[WARNING]                    > git push -u origin master
[WARNING]                    (stderr)
                             remote: Repository not found.
                             fatal: repository 'https://github.com/etijskens/ET-dot/' not found
[INFO]               ] done.
[WARNING]            Run 'poetry install' in the project directory to create a virtual environment and install its dependencies.
[INFO]           ] done.
> cd ET-dot

As the output shows the module name is converted from the project name and made compliant with the PEP8 module naming rules: et_dot. Next, we create a virtual environment for the project with all the standard micc dependencies:

> poetry install
Creating virtualenv et-dot in /Users/etijskens/software/dev/workspace/tmp/ET-dot/.venv
Updating dependencies
Resolving dependencies... (0.8s)

Writing lock file


Package operations: 10 installs, 0 updates, 0 removals

  - Installing pyparsing (2.4.5)
  - Installing six (1.13.0)
  - Installing atomicwrites (1.3.0)
  - Installing attrs (19.3.0)
  - Installing more-itertools (8.0.2)
  - Installing packaging (19.2)
  - Installing pluggy (0.13.1)
  - Installing py (1.8.0)
  - Installing wcwidth (0.1.7)
  - Installing pytest (4.6.7)
  - Installing ET-dot (0.0.0)
>

Next, activate the virtual environment:

> source .venv/bin/activate (.venv) >

Open module file et_dot.py in your favourite editor and code a dot product method (naievely) as follows:

# -*- coding: utf-8 -*-
"""
Package et_dot
==============
Python module for computing the dot product of two arrays.
"""
__version__ = "0.0.0"

def dot(a,b):
    """Compute the dot product of *a* and *b*.

    :param a: a 1D array.
    :param b: a 1D array of the same length as *a*.
    :returns: the dot product of *a* and *b*.
    :raises: ArithmeticError if ``len(a)!=len(b)``.
    """
    n = len(a)
    if len(b)!=n:
        raise ArithmeticError("dot(a,b) requires len(a)==len(b).")
    d = 0
    for i in range(n):
        d += a[i]*b[i]
    return d

We defined a dot() method with an informative doc-string that describes the parameters, the return value and the kind of exceptions it may raise.

We could use the dot method in a script as follows:

from et_dot import dot

a = [1,2,3]
b = [4.1,4.2,4.3]
a_dot_b = dot(a,b)

Note

This dot product implementation is naive for many reasons:

  • Python is very slow at executing loops, as compared to Fortran or C++.
  • The objects we are passing in are plain Python list`s. A :py:obj:`list is a very powerfull data structure, with array-like properties, but it is not exactly an array. A list is in fact an array of pointers to Python objects, and therefor list elements can reference anything, not just a numeric value as we would expect from an array. With elements being pointers, looping over the array elements implies non-contiguous memory access, another source of inefficiency.
  • The dot product is a subject of Linear Algebra. Many excellent libraries have been designed for this purpose. Numpy should be your starting point because it is well integrated with many other Python packages. There is also Eigen a C++ library for linear algebra that is neatly exposed to Python by pybind11.

In order to verify that our implementation of the dot product is correct, we write a test. For this we open the file tests/test_et_dot.py. Remove the original tests, and add a new one:

import et_dot

def test_dot_aa():
    a = [1,2,3]
    expected = 14
    result = et_dot.dot(a,a)
    assert result==expected

Save the file, and run the test. Pytest will show a line for every test source file. On each such line a . will appear for every successfull test, and a F for a failing test.

(.venv) > pytest
=============================== test session starts ===============================
platform darwin -- Python 3.7.4, pytest-4.6.5, py-1.8.0, pluggy-0.13.0
rootdir: /Users/etijskens/software/dev/workspace/ET-dot
collected 1 item

tests/test_et_dot.py .                                                      [100%]

============================ 1 passed in 0.08 seconds =============================
(.venv) >

Note

If the project’s virtual environment is not activated, the command pytest will generally not be found.

Great! our test succeeded. Let’s increment the project’s version (-p is short for --patch, and requests incrementing the patch component of the version string):

(.venv) > micc version -p
[INFO]           (ET-dot)> micc version (0.0.0) -> (0.0.1)

Obviously, our test tests only one particular case. A clever way of testing is to focus on properties. From mathematics we now that the dot product is commutative. Let’s add a test for that.

import random

def test_dot_commutative():
    # create two arrays of length 10 with random float numbers:
    a = []
    b = []
    for _ in range(10):
        a.append(random.random())
        b.append(random.random())
    # do the test
    ab = et_dot.dot(a,b)
    ba = et_dot.dot(b,a)
    assert ab==ba

You can easily verify that this test works too. We increment the version string again:

(.venv) > micc version -p
[INFO]           (ET-dot)> micc version (0.0.1) -> (0.0.2)

There is however a risk in using arrays of random numbers. Maybe we were just lucky and got random numbers that satisfy the test by accident. Also the test is not reproducible anymore. The next time we run pytest we will get other random numbers, and may be the test will fail. That would represent a serious problem: since we cannot reproduce the failing test, we have no way finding out what went wrong. For random numbers we can fix the seed at the beginning of the test. Random number generators are deterministic, so fixing the seed makes the code reproducible. To increase coverage we put a loop around the test.

def test_dot_commutative_2():
    # Fix the seed for the random number generator of module random.
    random.seed(0)
    # choose array size
    n = 10
    # create two arrays of length n with with zeros:
    a = n * [0]
    b = n * [0]
    # repetion loop:
    for r in range(1000):
        # fill a and b with random float numbers:
        for i in range(n):
            a[i] = random.random()
            b[i] = random.random()
        # do the test
        ab = et_dot.dot(a,b)
        ba = et_dot.dot(b,a)
        assert ab==ba

Again the test works. Another property of the dot product is that the dot product with a zero vector is zero.

def test_dot_zero():
    # Fix the seed for the random number generator of module random.
    random.seed(0)
    # choose array size
    n = 10
    # create two arrays of length n with with zeros:
    a = n * [0]
    zero = n * [0]
    # repetion loop (the underscore is a placeholder for a variable dat we do not use):
    for _ in range(1000):
        # fill a with random float numbers:
        for i in range(n):
            a[i] = random.random()
        # do the test
        azero = et_dot.dot(a,zero)
        assert azero==0

This test works too. Furthermore, the dot product with a vector of ones is the sum of the elements of the other vector:

def test_dot_one():
    # Fix the seed for the random number generator of module random.
    random.seed(0)
    # choose array size
    n = 10
    # create two arrays of length n with with zeros:
    a = n * [0]
    one = n * [1.0]
    # repetion loop (the underscore is a placeholder for a variable dat we do not use):
    for _ in range(1000):
        # fill a with random float numbers:
        for i in range(n):
            a[i] = random.random()
        # do the test
        aone = et_dot.dot(a,one)
        expected = sum(a)
        assert aone==expected

Success again. We are getting quite confident in the correctness of our implementation. Here is another test:

def test_dot_one_2():
    a1 = 1.0e16
    a   = [a1 ,1.0,-a1]
    one = [1.0,1.0,1.0]
    expected = 1.0
    result = et_dot.dot(a,one)
    assert result==expected

Clearly, it is a special case of the test above the expected result is the sum of the elements in a, that is 1.0. Yet it - unexpectedly - fails. Fortunately pytest produces a readable report about the failure:

> pytest
================================= test session starts ==================================
platform darwin -- Python 3.7.4, pytest-4.6.5, py-1.8.0, pluggy-0.13.0
rootdir: /Users/etijskens/software/dev/workspace/ET-dot
collected 6 items

tests/test_et_dot.py .....F                                                      [100%]

======================================= FAILURES =======================================
____________________________________ test_dot_one_2 ____________________________________

    def test_dot_one_2():
        a1 = 1.0e16
        a   = [a1 , 1.0, -a1]
        one = [1.0, 1.0, 1.0]
        expected = 1.0
        result = et_dot.dot(a,one)
>       assert result==expected
E       assert 0.0 == 1.0

tests/test_et_dot.py:91: AssertionError
========================== 1 failed, 5 passed in 0.17 seconds ==========================
>

Mathematically, our expectations about the outcome of the test are certainly correct. Yet, pytest tells us it found that the result is 0.0 rather than 1.0. What could possibly be wrong? Well our mathematical expectations are based on our - false - assumption that the elements of a are real numbers, most of which in decimal representation are characterised by an infinite number of digits. Computer memory being finite, however, Python (and for that matter all other programming languages) uses a finite number of bits to approximate real numbers. These numbers are called floating point numbers and their arithmetic is called floating point arithmetic. Floating point arithmetic has quite different properties than real number arithmetic. A floating point number in Python uses 64 bits which yields approximately 15 representable digits. Observe the consequences of this in the Python statements below:

>>> 1.0 + 1e16
1e+16
>>> 1e16 + 1.0 == 1e16
True
>>> 1.0 + 1e16 == 1e16
True
>>> 1e16 + 1.0 - 1e16
0.0

There are several lessons to be learned from this:

  • The test does not fail because our code is wrong, but because our mind is used to reasoning about real number arithmetic, rather than floating point arithmetic rules. As the latter is subject to round-off errors, tests sometimes fail unexpectedly. Note that for comparing floating point numbers the the standard library provides a math.isclose() method.
  • Another silent assumption by which we can be mislead is in the random numbers. In fact, random.random() generates pseudo-random numbers in the interval ``[0,1[``, which is quite a bit smaller than ]-inf,+inf[. No matter how often we run the test the special case above that fails will never be encountered, which may lead to unwarranted confidence in the code.

So, how do we cope with the failing test? Here is a way using math.isclose():

import math

def test_dot_one_2():
    a1 = 1.0e16
    a   = [a1 , 1.0, -a1]
    one = [1.0, 1.0, 1.0]
    expected = 1.0
    result = et_dot.dot(a,one)
    # assert result==expected
    assert math.isclose(result, expected, abs_tol=10.0)

This is a reasonable solution if we accept that when dealing with numbers as big as 1e19, an absolute difference of 10 is negligible.

Another aspect that should be tested is the behavior of the code in exceptional circumstances. Does it indeed raise ArithmeticError if the arguments are not of the same length? Here is a test:

import pytest

def test_dot_unequal_length():
    a = [1,2]
    b = [1,2,3]
    with pytest.raises(ArithmeticError):
        et_dot.dot(a,b)

Here, pytest.raises() is a context manager that will verify that ArithmeticError is raise when its body is executed.

Note

A detailed explanation about context managers see https://jeffknupp.com/blog/2016/03/07/python-with-context-managers//

Note that you can easily make et_dot.dot() raise other exceptions, e.g. TypeError by passing in arrays of non-numeric types:

>>> et_dot.dot([1,2],[1,'two'])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/etijskens/software/dev/workspace/ET-dot/et_dot.py", line 23, in dot
    d += a[i]*b[i]
TypeError: unsupported operand type(s) for +=: 'int' and 'str'
>>>

Note that it is not the product a[i]*b[i] for i=1 that is wreaking havoc, but the addition of its result to d.

At this point you might notice that even for a very simple and well defined function as the dot product the amount of test code easily exceeds the amount of tested code by a factor of 5 or more. This is not at all uncommon. As the tested code here is an isolated piece of code, you will probably leave it alone as soon as it passes the tests and you are confident in the solution. If at some point, the dot() would fail you should write a test that reproduces the error and improve the solution so that it passes the test.

When constructing software for more complex problems, there will very soon be many interacting components and running the tests after modifying one of the components will help you assure that all components still play well together, and spot problems as soon as possible.

At this point we want to produce a git tag of the project:

(.venv) > micc tag
[INFO] Creating git tag v0.0.7 for project ET-dot
[INFO] Done.

The tag is a label for the current code base of our project.

1.3 Improving efficiency

There are times when a correct solution - i.e. a code that solves the problem correctly - is sufficient. Very often, however, there are constraints on the time to solution, and the computing resources (number of cores and nodes, memory, ..) are requested to be used efficiently. Especially in scientific computing and high performance computing, where compute tasks may run for many days using hundreds of compute nodes and resources are to be shared with many researchers, using the resources efficiently is of utmost importance.

However important efficiency may be, it is nevertheless a good strategy for developing a new piece of code, to start out with a simple, even naive implementation in Python, neglecting all efficiency considerations, but focussing on correctness. Python has a reputation of being an extremely productive programming language. Once you have proven the correctness of this first version it can serve as a reference solution to verify the correctness of later efficiency improvements. In addition, the analysis of this version can highlight the sources of inefficiency and help you focus your attention to the parts that really need it.

Timing your code

The simplest way to probe the efficiency of your code is to time it: write a simple script and record how long it takes to execute. Let us first look at the structure of a Python script.

Here’s a script (using the above structure) that computes the dot product of two long arrays of random numbers.

"""file et_dot/prof/run1.py"""
import random
from et_dot import dot

def random_array(n=1000):
    """Initialize an array with n random numbers in [0,1[."""
    # Below we use a list comprehension (a Python idiom for creating a list from an iterable object).
    a = [random.random() for i in range(n)]
    return a

if __name__=='__main__':
    a = random_array()
    b = random_array()
    print(dot(a,b))
    print('-*# done #*-')

We store this file, which we rather simply called run1.py, in a directory prof in the project directory where we intend to keep all our profiling work. You can execute the script from the command line (with the project directory as the current working directory:

(.venv) > python ./prof/run1.py
251.08238559724717
-*# done #*-

Note

As our script does not fix the random number seed, every run has a different outcome.

We are now ready to time our script. There are many ways to achieve this. Here is a particularly good introduction. The et-stopwatch project takes this a little further. We add it as a development dependency of our project:

(.venv) > poetry add et_stopwatch -D
Using version ^0.3.0 for et_stopwatch
Updating dependencies
Resolving dependencies... (0.2s)
Writing lock file
Package operations: 1 install, 0 updates, 0 removals
  - Installing et-stopwatch (0.3.0)
(.venv) >

Note

A development dependency is a package that is not needed for using the package at hand, bit only needed for developing it.

Using the Stopwatch class to time pieces of code is simple:

"""file et_dot/prof/run1.py"""
from et_stopwatch import Stopwatch

...

if __name__=='__main__':
    with Stopwatch(message="init"):
        a = random_array()
        b = random_array()
    with Stopwatch(message="dot "):
        dot(a,b)
    print('-*# done #*-')

When the script is exectuted the two print statements will print the duration of the initalisation of a and b and of the computation of the dot product of a and b. Finally, upon exit the Stopwatch will print the total time.

(.venv) > python ./prof/run1.py
init: 0.000281 s
dot : 0.000174 s
-*# done #*-
>

Note that the initialization phase took longer than the computation. Random number generation is rather expensive.

Comparing to Numpy

As said earlier, our implementation of the dot product is rather naive. If you want to become a good programmer, you should understand that you are probably not the first researcher in need of a dot product implementation. For most linear algebra problems, Numpy provides very efficient implementations. Below the run1.py script adds timing results for the Numpy equivalent of our code.

"""file et_dot/prof/run1.py"""
import numpy as np

...

if __name__=='__main__':
    with Stopwatch(name="et init"):
        a = random_array()
        b = random_array()
    with Stopwatch(name="et dot "):
        dot(a,b)

    with Stopwatch(name="np init"):
        a = np.random.rand(1000)
        b = np.random.rand(1000)
    with Stopwatch(name="np dot "):
        np.dot(a,b)

    print('-*# done #*-')

Obviously, to run this script, we must first install Numpy (again as a development dependency):

(.venv) > poetry add numpy -D
Using version ^1.18.1 for numpy
Updating dependencies
Resolving dependencies... (1.5s)
Writing lock file
Package operations: 1 install, 0 updates, 0 removals
  - Installing numpy (1.18.1)
(.venv) >

Here are the results of the modified script:

(.venv) > python ./prof/run1.py
et init: 0.000252 s
et dot : 0.000219 s
np init: 7.8e-05 s
np dot : 3.2e-05 s
-*# done #*-
>

Obviously, Numpy does significantly better than our naive dot product implementation. The reasons for this improvement are:

  • Numpy arrays are contiguous data structures of floating point numbers, unlike Python’s list. Contiguous memory access is far more efficient.
  • The loop over Numpy arrays is implemented in a low-level programming languange. This allows to make full use of the processors hardware features, such as vectorization and fused multiply-add (FMA).
Conclusion

There are three important generic lessons to be learned from this tutorial:

  1. Always start your projects with a simple and straightforward implementation which can be easily be proven to be correct. Write test code for proving correctness.
  2. Time your code to understand which parts are time consuming and which not. Optimize bottlenecks first and do not waste time optimizing code that does not contribute significantly to the total runtime. Optimized code is typically harder to read and may become a maintenance issue.
  3. Before you write code, in this case our dot product implementation, spent some time searching the internet to see what is already available. Especially in the field of scientific and high performance computing there are many excellent libraries available which are hard to beat. Use your precious time for new stuff.

Tutorial 2: Binary extensions

Suppose for a moment that Numpy did not have a dot product implementation and that the implementation provided in Tutorial-1 is way too slow to be practical for your research project. Consequently, you are forced to accelarate your dot product code in some way or another. There are several approaches for this. Here are a number of interesting links covering them:

Most of these approaches do not require special support from Micc to get you going, and we encourage you to go try out the High Performance Python series 1-3 for the ET-dot project. Two of the approaches discussed involve rewriting your code in Modern Fortran or C++ and generate a shared library that can be imported in Python just as any Python module. Such shared libraries are called binary extension modules. Constructing binary extension modules is by far the most scalable and flexible of all current acceleration strategies, as these languages are designed to squeeze the maximum of performance out of a CPU. However, figuring out how to make this work is a bit of a challenge, especially in the case of C++.

This is in fact one of the main reasons why Micc was designed: facilitating the construction of binary extension modules and enabling the developer to create high performance tools with ease.

2.1 Binary extensions in Micc projects

Micc provides boilerplate code for binary extensions as well as some practical wrappers around top-notch tools for building binary extensions from Fortran and C++. Fortran code is compiled into a Python module using f2py (which comes with Numpy). For C++ we use Pybind11 and CMake.

Adding a binary extension is as simple as:

> micc add foo --f90   # add a binary extension 'foo' written in (Modern) Fortran
> micc add bar --cpp   # add a binary extension 'bar' written in C++

Note

For the micc add command to be valid, your project must have a package structure (see `Modules and packages`_).

Enter your own code in the generated source code files and execute :

(.venv) > micc-build

Note

The virtual environment must be activated to execute the micc-build command (see `Virtual environments`_).

If there are no syntax errors your binary extensions will be built, and you will be able to import the modules foo and bar in your project and use their subroutines and functions. Because foo and bar are submodules of your micc project, you must import them as:

import my_package.foo
import my_package.bar

# call foofun in my_package.foo
my_package.foo.foofun(...)

# call barfun in my_package.bar
my_package.bar.barfun(...)

where my_package is the name of the top package of your micc project.

Choosing between Fortran and C++ for binary extension modules

Here are a number of arguments that you may wish to take into account for choosing the programming language for your binary extension modules:

  • Fortran is a simpler language than C++
  • It is easier to write efficient code in Fortran than C++.
  • C++ is a much more expressive language
  • C++ comes with a huge standard library, providing lots of data structures and algorithms that are hard to match in Fortran. If the standard library is not enough, there is also the highly recommended Boost libraries and many other domain specific libraries. There are also domain specific libraries in Fortran, but the amount differs by an order of magnitude at least.
  • With Pybind11 you can almost expose anything from the C++ side to Python, not just functions.
  • Modern Fortran is (imho) not as good documented as C++. Useful place to look for language features and idioms are:

In short, C++ provides much more possibilities, but it is not for the novice. As to my own experience, I discovered that working on projects of moderate complexity I progressed significantly faster using Fortran rather than C++, despite the fact that my knowledge of Fortran is quite limited compared to C++. However, your mileage may vary.

2.2 Building binary extensions from Fortran

Binary extension modules based on Fortran are called f90 modules. Micc uses the f2py tool to build these binary extension modules from Fortran. F2py is part of Numpy.

Note

To be able to add a binary extension module (as well as any other component supported by micc, such as Python modules or CLI applications) to a micc project, your project must have a package structure. This is easily checked by running the micc info command:

> micc info
Project ET-dot located at /home/bert/software/workspace/ET-dot
  package: et_dot
  version: 0.0.0
  structure: et_dot/__init__.py (Python package)
>

If it does, the structure line of the output will read as above. If, however, the structure line reads:

structure: et_dot.py (Python module)

you should convert it by running:

> micc convert-to-package --overwrite

See `Modules and packages`_ for details.

We are now ready to create a f90 module for a Fortran implementation fof the dot product, say dotf, where the f, obviously, stands for Fortran:

> micc add dotf --f90
[INFO]           [ Adding f90 module dotf to project ET-dot.
[INFO]               - Fortran source in       ET-dot/et_dot/f90_dotf/dotf.f90.
[INFO]               - Python test code in     ET-dot/tests/test_f90_dotf.py.
[INFO]               - module documentation in ET-dot/et_dot/f90_dotf/dotf.rst (in restructuredText format).
[WARNING]            Dependencies added. Run \'poetry update\' to update the project\'s virtual environment.
[INFO]           ] done.

The output tells us where to enter the Fortran source code, the test code and the documentation. Enter the Fortran implementation of the dot product below in the Fortran source file ET-dot/et_dot/f90_dotf/dotf.f90 (using your favourite editor or an IDE):

function dotf(a,b,n)
  ! Compute the dot product of a and b
  !
    implicit none
  !-------------------------------------------------------------------------------------------------
    integer*4              , intent(in)    :: n
    real*8   , dimension(n), intent(in)    :: a,b
    real*8                                 :: dotf
  !-------------------------------------------------------------------------------------------------
  ! declare local variables
    integer*4 :: i
  !-------------------------------------------------------------------------------------------------
    dotf = 0.
    do i=1,n
        dotf = dotf + a(i) * b(i)
    end do
end function dotf

The output of the micc add dotf --f90 command above also shows a warning:

[WARNING]            Dependencies added. Run `poetry update` to update the project's virtual environment.

Micc is telling you that it added some dependencies to your project. In order to be able to build the binary extension dotf these dependencies must be installed in the virtual environment of our project by running poetry update.

> poetry update
Updating dependencies
Resolving dependencies... (2.5s)

Writing lock file


Package operations: 40 installs, 0 updates, 0 removals

  - Installing certifi (2019.11.28)
  - Installing chardet (3.0.4)
  - Installing idna (2.8)
  - Installing markupsafe (1.1.1)
  - Installing python-dateutil (2.8.1)
  - Installing pytz (2019.3)
  - Installing urllib3 (1.25.7)
  - Installing alabaster (0.7.12)
  - Installing arrow (0.15.4)
  - Installing babel (2.7.0)
  - Installing docutils (0.15.2)
  - Installing imagesize (1.1.0)
  - Installing jinja2 (2.10.3)
  - Installing pygments (2.5.2)
  - Installing requests (2.22.0)
  - Installing snowballstemmer (2.0.0)
  - Installing sphinxcontrib-applehelp (1.0.1)
  - Installing sphinxcontrib-devhelp (1.0.1)
  - Installing sphinxcontrib-htmlhelp (1.0.2)
  - Installing sphinxcontrib-jsmath (1.0.1)
  - Installing sphinxcontrib-qthelp (1.0.2)
  - Installing sphinxcontrib-serializinghtml (1.1.3)
  - Installing binaryornot (0.4.4)
  - Installing click (7.0)
  - Installing future (0.18.2)
  - Installing jinja2-time (0.2.0)
  - Installing pbr (5.4.4)
  - Installing poyo (0.5.0)
  - Installing sphinx (2.2.2)
  - Installing whichcraft (0.6.1)
  - Installing cookiecutter (1.6.0)
  - Installing semantic-version (2.8.3)
  - Installing sphinx-click (2.3.1)
  - Installing sphinx-rtd-theme (0.4.3)
  - Installing tomlkit (0.5.8)
  - Installing walkdir (0.4.1)
  - Installing et-micc (0.10.10)
  - Installing numpy (1.17.4)
  - Installing pybind11 (2.4.3)
  - Installing et-micc-build (0.10.10)

Note from the last lines in the output that micc-build, which is a companion of Micc that encapsulates the machinery that does the hard work of building the binary extensions, depends on pybind11, Numpy, and on micc itself. As a consaequence, micc is now also installed in the projects virtual environment. Therefore, when the project’s virtual environment is activated, the active micc is the one in the project’s virtual environment:

> source .venv/bin/activate
(.venv) > which micc
path/to/ET-dot/.venv/bin/micc
(.venv) >

We might want to increment the minor component of the version string by now:

(.venv) > micc version -m
[INFO]           (ET-dot)> micc version (0.0.7) -> (0.1.0)

The binary extension module can now be built:

(.venv) > micc-build
[INFO] [ Building f90 module dotf in directory '/Users/etijskens/software/dev/workspace/ET-dot/et_dot/f90_dotf/build_'
...
[DEBUG]          >>> shutil.copyfile( 'dotf.cpython-37m-darwin.so', '/Users/etijskens/software/dev/workspace/ET-dot/et_dot/dotf.cpython-37m-darwin.so' )
[INFO] ] done.
[INFO] Check /Users/etijskens/software/dev/workspace/ET-dot/micc-build-f90_dotf.log for details.
[INFO] Binary extensions built successfully:
[INFO] - ET-dot/et_dot/dotf.cpython-37m-darwin.so
(.venv) >

This command produces a lot of output, most of which is rather uninteresting - except in the case of errors. At the end is a summary of all binary extensions that have been built, or failed to build. If the source file does not have any syntax errors, you will see a file like dotf.cpython-37m-darwin.so in directory ET-dot/et_dot:

(.venv) > ls -l et_dot
total 8
-rw-r--r--  1 etijskens  staff  720 Dec 13 11:04 __init__.py
drwxr-xr-x  6 etijskens  staff  192 Dec 13 11:12 f90_dotf/
lrwxr-xr-x  1 etijskens  staff   92 Dec 13 11:12 dotf.cpython-37m-darwin.so@ -> path/to/ET-dot/et_dot/f90_foo/foo.cpython-37m-darwin.so

Note

The extension of the module dotf.cpython-37m-darwin.so will depend on the Python version (c.q. 3.7) you are using, and on your operating system (c.q. MacOS).

Since our binary extension is built, we can test it. Here is some test code. Enter it in file ET-dot/tests/test_f90_dotf.py:

# import the binary extension and rename the module locally as f90
import et_dot.dotf as f90
import numpy as np

def test_dotf_aa():
    a = np.array([0,1,2,3,4],dtype=np.float)
    expected = np.dot(a,a)
    a_dotf_a = f90.dotf(a,a)
    assert a_dotf_a==expected

The astute reader will notice the magic that is happening here: a is a numpy array, which is passed as is to our et_dot.dotf.dotf() function in our binary extension. An invisible wrapper function will check the types of the numpy arrays, retrieve pointers to the memory of the numpy arrays and feed those pointers into our Fortran function, the result of which is stored in a Python variable a_dotf_a. If you look carefully at the output of ``micc-build`, you will see information about the wrappers that f2py constructed.

Passing Numpy arrays directly to Fortran routines is extremely productive. Many useful Python packages use numpy for arrays, vectors, matrices, linear algebra, etc. By being able to pass Numpy arrays directly into your own number crunching routines relieves you from conversion between array types. In addition you can do the memory management of your arrays and their initialization in Python.

As you can see we test the outcome of dotf against the outcome of numpy.dot(). We thrust that outcome, but beware that this test may be susceptible to round-off error because the representation of floating point numbers in Numpy and in Fortran may differ slightly.

Here is the outcome of pytest:

> pytest
================================ test session starts =================================
platform darwin -- Python 3.7.4, pytest-4.6.5, py-1.8.0, pluggy-0.13.0
rootdir: /Users/etijskens/software/dev/workspace/ET-dot
collected 8 items

tests/test_et_dot.py .......                                                   [ 87%]
tests/test_f90_dotf.py .                                                       [100%]

============================== 8 passed in 0.16 seconds ==============================
>

All our tests passed. Of course we can extend the tests in the same way as we did for the naive Python implementation in the previous tutorial. We leave that as an exercise to the reader.

Increment the version string and produce tag:

(.venv) > micc version -p -t
[INFO]           (ET-dot)> micc version (0.1.0) -> (0.1.1)
[INFO]           Creating git tag v0.1.1 for project ET-dot
[INFO]           Done.

Note

If you put your subroutines and functions inside a Fortran module, as in:

MODULE my_f90_module
  implicit none
  contains
    function dot(a,b)
      ...
    end function dot
END MODULE my_f90_module

then the binary extension module will expose the Fortran module name my_f90_module which in turn exposes the function/subroutine names:

>>> import et_dot
>>> a = [1.,2.,3.]
>>> b = [2.,2.,2.]
>>> et_dot.dot(a,b)
>>> AttributeError
Module et_dot has no attribute 'dot'.
>>> et_dot.my_F90_module.dot(a,b)
12.0

If you are bothered by having to type et_dot.my_F90_module. every time, use this trick:

>>> import et_dot
>>> f90 = et_dot.my_F90_module
>>> f90.dot(a,b)
12.0
>>> fdot = et_dot.my_F90_module.dot
>>> fdot(a,b)
12.0

2.3 Building binary extensions from C++

To illustrate building binary extension modules from C++ code, let us also create a C++ implementation for the dot product. Such modules are called cpp modules. Analogously to our dotf module we will call the cpp module dotc, the c referring to C++.

Note

To add binary extension modules to a project, it must have a package structure. To check, you may run the micc info command and verify the structure line. If it mentions Python module, you must convert the structure by running micc convert-to-package --overwrite. See `Modules and packages`_ for details.

Use the micc add command to add a cpp module:

> micc add dotc --cpp
[INFO]           [ Adding cpp module dotc to project ET-dot.
[INFO]               - C++ source in           ET-dot/et_dot/cpp_dotc/dotc.cpp.
[INFO]               - module documentation in ET-dot/et_dot/cpp_dotc/dotc.rst (in restructuredText format).
[INFO]               - Python test code in     ET-dot/tests/test_cpp_dotc.py.
[WARNING]            Dependencies added. Run \'poetry update\' to update the project\'s virtual environment.
[INFO]           ] done.

The output explains you where to add the C++ source code, the test code and the documentation. First take care of the warning:

(.venv) > poetry update
Updating dependencies
Resolving dependencies... (1.7s)
No dependencies to install or update

Typically, there will be nothing to install, because micc-build was already installed when we added the Fortran module dotf (see 2.2 Building binary extensions from Fortran). Sometimes one of the packages you depend on may just have seen a new release and poetry will perform an upgrade:

(.venv) > poetry update
Updating dependencies
Resolving dependencies... (1.6s)
Writing lock file
Package operations: 0 installs, 1 update, 0 removals
  - Updating zipp (0.6.0 -> 1.0.0)
(.venv) >

Micc uses pybind11 to create Python wrappers for C++ functions. This is by far the most practical choice for this (see https://channel9.msdn.com/Events/CPP/CppCon-2016/CppCon-2016-Introduction-to-C-python-extensions-and-embedding-Python-in-C-Apps for a good overview of this topic). It has a lot of ‘automagical’ features, and it has a header-only C++ library - so, thus effectively preventing installation problems. Boost.Python offers very similar features, but is not header-only and its library depends on the python version you want to use - so you need a different library for every Python version you want to use.

This is a good point to increment the minor component of the version string:

(.venv) > micc version -m
[INFO]           (ET-dot)> micc version (0.1.1) -> (0.2.0)

Enter this code in the C++ source file ET-dot/et_dot/cpp_dotc/dotc.cpp

#include <pybind11/pybind11.h>
#include <pybind11/numpy.h>

double
dotc( pybind11::array_t<double> a
    , pybind11::array_t<double> b
    )
{
    auto bufa = a.request()
       , bufb = b.request()
       ;
 // verify dimensions and shape:
    if( bufa.ndim != 1 || bufb.ndim != 1 ) {
        throw std::runtime_error("Number of dimensions must be one");
    }
    if( (bufa.shape[0] != bufb.shape[0]) ) {
        throw std::runtime_error("Input shapes must match");
    }
 // provide access to raw memory
 // because the Numpy arrays are mutable by default, py::array_t is mutable too.
 // Below we declare the raw C++ arrays for x and y as const to make their intent clear.
    double const *ptra = static_cast<double const *>(bufa.ptr);
    double const *ptrb = static_cast<double const *>(bufb.ptr);

    double d = 0.0;
    for (size_t i = 0; i < bufa.shape[0]; i++)
        d += ptra[i] * ptrb[i];

    return d;
}

// describe what goes in the module
PYBIND11_MODULE(dotc, m)
{// optional module docstring:
    m.doc() = "pybind11 dotc plugin";
 // list the functions you want to expose:
 // m.def("exposed_name", function_pointer, "doc-string for the exposed function");
    m.def("dotc", &dotc, "The dot product of two arrays 'a' and 'b'.");
}

Obviously the C++ source code is more involved than its Fortran equivalent in the previous section. This is because f2py is a program performing clever introspection into the Fortran source code, whereas pybind11 is nothing but a C++ template library. As such it is not capable of introspection and the user is obliged to use pybind11 for accessing the arguments passed in by Python.

We can now build the module. Because we do not want to rebuild the dotf module we add -m dotc to the command line, to indicate that only module dotc must be built:

(.venv)> micc build -m dotc
 [INFO] [ Building cpp module 'dotc':
 [DEBUG]          [ > cmake -D PYTHON_EXECUTABLE=/Users/etijskens/software/dev/workspace/tmp/ET-dot/.venv/bin/python -D pybind11_DIR=/Users/etijskens/software/dev/workspace/tmp/ET-dot/.venv/lib/python3.7/site-packages/et_micc_build/cmake_tools -D CMAKE_BUILD_TYPE=RELEASE ..
 [DEBUG]              (stdout)
                        -- The CXX compiler identification is AppleClang 11.0.0.11000033
                        -- Check for working CXX compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++
                        -- Check for working CXX compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ -- works
                        -- Detecting CXX compiler ABI info
                        -- Detecting CXX compiler ABI info - done
                        -- Detecting CXX compile features
                        -- Detecting CXX compile features - done
                        -- Found PythonInterp: /Users/etijskens/software/dev/workspace/tmp/ET-dot/.venv/bin/python (found version "3.7.5")
                        -- Found PythonLibs: /Users/etijskens/.pyenv/versions/3.7.5/lib/libpython3.7m.a
                        -- Performing Test HAS_CPP14_FLAG
                        -- Performing Test HAS_CPP14_FLAG - Success
                        -- Performing Test HAS_FLTO
                        -- Performing Test HAS_FLTO - Success
                        -- LTO enabled
                        -- Configuring done
                        -- Generating done
                        -- Build files have been written to: /Users/etijskens/software/dev/workspace/tmp/ET-dot/et_dot/cpp_dotc/_cmake_build
 [DEBUG]          ] done.
 [DEBUG]          [ > make
 [DEBUG]              (stdout)
                        Scanning dependencies of target dotc
                        [ 50%] Building CXX object CMakeFiles/dotc.dir/dotc.cpp.o
                        [100%] Linking CXX shared module dotc.cpython-37m-darwin.so
                        [100%] Built target dotc
 [DEBUG]          ] done.
 [DEBUG]          >>> os.remove(/Users/etijskens/software/dev/workspace/tmp/ET-dot/et_dot/cpp_dotc/dotc.cpython-37m-darwin.so)
 [DEBUG]          >>> shutil.copyfile( '/Users/etijskens/software/dev/workspace/tmp/ET-dot/et_dot/cpp_dotc/_cmake_build/dotc.cpython-37m-darwin.so', '/Users/etijskens/software/dev/workspace/tmp/ET-dot/et_dot/cpp_dotc/dotc.cpython-37m-darwin.so' )
 [DEBUG]          [ > ln -sf /Users/etijskens/software/dev/workspace/tmp/ET-dot/et_dot/cpp_dotc/dotc.cpython-37m-darwin.so /Users/etijskens/software/dev/workspace/tmp/ET-dot/et_dot/cpp_dotc/dotc.cpython-37m-darwin.so
 [DEBUG]          ] done.
 [INFO] ] done.
 [INFO]           Binary extensions built successfully:
 [INFO]           - /Users/etijskens/software/dev/workspace/tmp/ET-dot/et_dot/dotc.cpython-37m-darwin.so
 (.venv)   >

The output shows that first CMake is called, followed by make and the installation of the binary extension with a soft link. Finally, lists of modules that have been built successfully, and modules that failed to build are output.

As usual the micc-build command produces a lot of output, most of which is rather uninteresting - except in the case of errors. If the source file does not have any syntax errors, and the build did not experience any problems, you will see a file like dotf.cpython-37m-darwin.so in directory ET-dot/et_dot:

(.venv) > ls -l et_dot
total 8
-rw-r--r--  1 etijskens  staff  1339 Dec 13 14:40 __init__.py
drwxr-xr-x  4 etijskens  staff   128 Dec 13 14:29 __pycache__/
drwxr-xr-x  7 etijskens  staff   224 Dec 13 14:43 cpp_dotc/
lrwxr-xr-x  1 etijskens  staff    93 Dec 13 14:43 dotc.cpython-37m-darwin.so@ -> /Users/etijskens/software/dev/workspace/tmp/ET-dot/et_dot/cpp_dotc/dotc.cpython-37m-darwin.so
lrwxr-xr-x  1 etijskens  staff    94 Dec 13 14:27 dotf.cpython-37m-darwin.so@ -> /Users/etijskens/software/dev/workspace/tmp/ET-dot/et_dot/f2py_dotf/dotf.cpython-37m-darwin.so
drwxr-xr-x  6 etijskens  staff   192 Dec 13 14:43 f90_dotf/
(.venv) >

Note

The extension of the module dotc.cpython-37m-darwin.so will depend on the Python version you are using, and on the operating system.

Although we haven’t tested dotc, this is a good point to increment the version string:

(.venv) > micc version -p
[INFO]           (ET-dot)> micc version (0.2.0) -> (0.2.1)

Here is the test code. It is almost exactly the same as that for the f90 module dotf, except for the module name. Enter the test code in ET-dot/tests/test_cpp_dotc.py:

import et_dot.dotc as cpp    # import the binary extension
import numpy as np

def test_dotc_aa():
    a = np.array([0,1,2,3,4],dtype=np.float)
    expected = np.dot(a,a)
    a_dotc_a = cpp.dotc(a,a)
    assert a_dotc_a==expected

The conversion between the Numpy arrays to C++ arrays is here less magical, as the user must provide code to do the conversion of Python variables to C++. This has the advantage of showing the mechanics of the conversion more clearly, but it also leaves more space for mistakes, and to beginners it may seem more complicated.

Finally, run pytest:

> pytest
================================ test session starts =================================
platform darwin -- Python 3.7.4, pytest-4.6.5, py-1.8.0, pluggy-0.13.0
rootdir: /Users/etijskens/software/dev/workspace/ET-dot
collected 9 items

tests/test_cpp_dotc.py .                                                       [ 11%]
tests/test_et_dot.py .......                                                   [ 88%]
tests/test_f90_dotf.py .                                                       [100%]

============================== 9 passed in 0.28 seconds ==============================

All our tests passed, which is a good reason to increment the version string and create a tag:

(.venv) > micc version -m -t
[INFO] Creating git tag v0.3.0 for project ET-dot
[INFO] Done.

2.4 Data type issues

An important point of attention when writing binary extension modules - and a common source of problems - is that the data types of the variables passed in from Python must match the data types of the Fortran or C++ routines.

Here is a table with the most relevant numeric data types in Python, Fortran and C++.

kind Numpy/Python Fortran C++
unsigned integer uint32 N/A signed long int
unsigned integer uint64 N/A signed long long int
signed integer int32 integer*4 signed long int
signed integer int64 integer*8 signed long long int
floating point float32 real*4 float
floating point float64 real*8 double
complex complex64 complex*4 std::complex<float>
complex complex128 complex*8 std::complex<double>
F2py

F2py is very flexible with respect to data types. In between the Fortran routine and Python call is a wrapper function which translates the function call, and if it detects that the data type on the Python sides and the Fortran sideare different, the wrapper function is allowed to copy/convert the variable when passing it to Fortran routine both, and also when passing the result back from the Fortran routine to the Python caller. When the input/output variables are large arrays copy/conversion operations can have a detrimental effect on performance and this is in HPC highly undesirable. Micc runs f2py with the -DF2PY_REPORT_ON_ARRAY_COPY=1 option. This causes your code to produce a warning everytime the wrapper decides to copy an array. Basically, this warning means that you have to modify your Python data structure to have the same data type as the Fortran source code, or vice versa.

Returning large data structures

The result of a Fortran function and a C++ function is always copied back to the Python variable that will hold it. As copying large data structures is detrimental to performance this shoud be avoided. The solution to this problem is to write Fortran functions or subroutines and C++ functions that accept the result variable as an argument and modify it in place, so that the copy operaton is avoided. Consider this example of a Fortran subroutine that computes the sum of two arrays. are some examples of array addition:

subroutine add(a,b,sumab,n)
  ! Compute the sum of arrays a and b and overwrite array sumab with the result
    implicit none

    integer*4              , intent(in)    :: n
    real*8   , dimension(n), intent(in)    :: a,b
    real*8   , dimension(n), intent(inout) :: sumab

  ! declare local variables
    integer*4 :: i

    do i=1,n
        sumab(i) = a(i) + b(i)
    end do
end subroutine add

The crucial issue here is that the result array sumab has intent(inout). If you qualify the intent of sumab as in you will not be able to overwrite it, whereas - surprisingly - qualifying it with intent(out) will force f2py to consider it as a left hand side variable, which implies copying the result on returning.

The code below does exactly the same but uses a function, not to return the result of the computation, but an error code.

function add(a,b,sumab,n)
  ! Compute the sum of arrays a and b and overwrite array sumab with the result
    implicit none

    integer*4              , intent(in)    :: n,add
    real*8   , dimension(n), intent(in)    :: a,b
    real*8   , dimension(n), intent(inout) :: sumab

  ! declare local variables
    integer*4 :: i

    do i=1,n
        sumab(i) = a(i) + b(i)
    end do

    add = ... ! set return value, e.g. an error code.

end function add

The same can be accomplished in C++:

#include <pybind11/pybind11.h>
#include <pybind11/numpy.h>

namespace py = pybind11;

void
add ( py::array_t<double> a
    , py::array_t<double> b
    , py::array_t<double> sumab
    )
{// request buffer description of the arguments
    auto buf_a = a.request()
       , buf_b = b.request()
       , buf_sumab = sumab.request()
       ;
    if( buf_a.ndim != 1
     || buf_b.ndim != 1
     || buf_sumab.ndim != 1 )
    {
        throw std::runtime_error("Number of dimensions must be one");
    }

    if( (buf_a.shape[0] != buf_b.shape[0])
     || (buf_a.shape[0] != buf_sumab.shape[0]) )
    {
        throw std::runtime_error("Input shapes must match");
    }
 // because the Numpy arrays are mutable by default, py::array_t is mutable too.
 // Below we declare the raw C++ arrays for a and b as const to make their intent clear.
    double const *ptr_a     = static_cast<double const *>(buf_a.ptr);
    double const *ptr_b     = static_cast<double const *>(buf_b.ptr);
    double       *ptr_sumab = static_cast<double       *>(buf_sumab.ptr);

    for (size_t i = 0; i < buf_a.shape[0]; i++)
        ptr_sumab[i] = ptr_a[i] + ptr_b[i];
}


PYBIND11_MODULE({{ cookiecutter.module_name }}, m)
{// optional module doc-string
    m.doc() = "pybind11 {{ cookiecutter.module_name }} plugin"; // optional module docstring
 // list the functions you want to expose:
 // m.def("exposed_name", function_pointer, "doc-string for the exposed function");
    m.def("add", &add, "A function which adds two arrays 'a' and 'b' and stores the result in the third, 'sumab'.");
}

Here, care must be taken that when casting buf_sumab.ptr one does not cast to const.

2.5 Specifying compiler options for binary extension modules

[ Advanced Topic ] As we have seen, binary extension modules can be programmed in Fortran and C++. Micc provides convenient wrappers to build such modules. Fortran source code is transformed to a python module using f2py, and C++ source using Pybind11 and CMake. Obviously, in both cases there is a compiler under the hood doing the hard work. By default these tools use the compiler they find on the path, but you may as well specify your favorite compiler.

Note

Compiler options are distinct for f2py modules and cpp modules.

Building a single module only

If you want to build a single binary extension module rather than all binary extension modules in the project, add the -m|--module option:

This will only build module my_module.

Performing a clean build

To perform a clean build, add the --clean flag to the micc build command:

This will remove the previous build directory and as well as the binary extension module.

Controlling the build of f90 modules

To specify the Fortran compiler, e.g. the GNU fortran compiler:

Note, that this exactly how you would have specified it using f2py directly. You can specify the Fortran compiler options you want using the --f90flags option:

In addition f2py (and micc-build for that matter) provides two extra options --opt for specifying optimization flags, and --arch for specifying architecture dependent optimization flags. These flags can be turned off by adding --noopt and --noarch, respectively. This can be convenient when exploring compile options. Finally, the --debug flag adds debug information during the compilation.

Micc_ build also provides a --build-type options which accepts release and debug as value (case insensitive). Specifying debug is equivalent to --debug --noopt --noarch.

Note

ALL f90 modules are built with the same options. To specify separate options for a particular module use the -m|--module option.

Note

Although there are some commonalities between the compiler options of the various compilers, you will most probably have to change the compiler options when you change the compiler.

Controlling the build of cpp modules

The build of C++ modules can be fully controlled by modifying the the module’s CMakeLists.txt file to your needs. Micc provides every cpp module with a template containing examples of frequently used CMake commands commented out. These include the specification of :

  • compiler options
  • preprocessor macros
  • include directories
  • link directories
  • link libraries

You just need to uncomment them and provide the values you need:

# ...

# set compiler:
# set(CMAKE_CXX_COMPILER path/to/executable)

# Add compiler options:
# set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} <additional C++ compiler options>")

# Add preprocessor macro definitions:
# add_compile_definitions(
#     OPENFOAM=1912                     # set value
#     WM_LABEL_SIZE=$ENV{WM_LABEL_SIZE} # set value from environment variable
#     WM_DP                             # just define the macro
# )

# Add include directories
#include_directories(
#     path/to/dir1
#     path/to/dir2
# )

#...

CMake provides default build options for four build types: DEBUG, MINSIZEREL, RELEASE, and RELWITHDEBINFO.

  • CMAKE_CXX_FLAGS_DEBUG: -g
  • CMAKE_CXX_FLAGS_MINSIZEREL: -Os -DNDEBUG
  • CMAKE_CXX_FLAGS_RELEASE: -O3 -DNDEBUG
  • CMAKE_CXX_FLAGS_RELWITHDEBINFO: -O2 -g -DNDEBUG

The build type is selected by setting the CMAKE_BUILD_TYPE variable (default: RELEASE).

For convenience, micc-build provides a command line argument --build-type for specifying the build type.

Save and load build options to/from file

With the --save option you can save the current build options to a file in .json format. This acts on a per project basis. E.g.:

will save the <my build options> to the file build.json in every binary module directory (the .json extension is added if omitted). You can restrict this to a single module with the --module option (see above). The saved options can be reused in a later build as:

2.6 Documenting binary extension modules

For Python modules the documentation is automatically extracted from the doc-strings in the module. However, when it comes to documenting binary extension modules, this does not seem a good option. Ideally, the source files ET-dot/et_dot/f90_dotf/dotf.f90 amnd ET-dot/et_dot/cpp_dotc/dotc.cpp should document the Fortran functions and subroutines, and C++ functions, respectively, rahter than the Python interface. Yet from the perspective of ET-dot being a Python project, the users is only interested in the documentation of the Python interface to those functions and subroutines. Therefore, micc requires you to document the Python interface in separate .rst files:

  • ET-dot/et_dot/f90_dotf/dotf.rst
  • ET-dot/et_dot/cpp_dotc/dotc.rst

Here are the contents, respectively, for ET-dot/et_dot/f90_dotf/dotf.rst:

Module et_dot.dotf
******************

Module :py:mod:`dotf` built from fortran code in :file:`f90_dotf/dotf.f90`.

.. function:: dotf(a,b)
   :module: et_dot.dotf

   Compute the dot product of *a* and *b* (in Fortran.)

   :param a: 1D Numpy array with ``dtype=numpy.float64``
   :param b: 1D Numpy array with ``dtype=numpy.float64``
   :returns: the dot product of *a* and *b*
   :rtype: ``numpy.float64``

and for ET-dot/et_dot/cpp_dotc/dotc.rst:

Module et_dot.dotc
******************

Module :py:mod:`dotc` built from fortran code in :file:`cpp_dotc/dotc.cpp`.

.. function:: dotc(a,b)
   :module: et_dot.dotc

   Compute the dot product of *a* and *b* (in C++.)

   :param a: 1D Numpy array with ``dtype=numpy.float64``
   :param b: 1D Numpy array with ``dtype=numpy.float64``
   :returns: the dot product of *a* and *b*
   :rtype: ``numpy.float64``

Note that the documentation must be entirely in .rst format (see restructuredText).

Build the documentation:

(.venv) > cd docs && make html
Already installed: click
Already installed: sphinx-click
Already installed: sphinx
Already installed: sphinx-rtd-theme
Running Sphinx v2.2.2
making output directory... done
WARNING: html_static_path entry '_static' does not exist
building [mo]: targets for 0 po files that are out of date
building [html]: targets for 7 source files that are out of date
updating environment: [new config] 7 added, 0 changed, 0 removed
reading sources... [100%] readme
looking for now-outdated files... none found
pickling environment... done
checking consistency... /Users/etijskens/software/dev/workspace/tmp/ET-dot/docs/apps.rst: WARNING: document isn't included in any toctree
done
preparing documents... done
writing output... [100%] readme
generating indices...  genindex py-modindexdone
highlighting module code... [100%] et_dot.dotc
writing additional pages...  search/Users/etijskens/software/dev/workspace/tmp/ET-dot/.venv/lib/python3.7/site-packages/sphinx_rtd_theme/search.html:20: RemovedInSphinx30Warning: To modify script_files in the theme is deprecated. Please insert a <script> tag directly in your theme instead.
  {{ super() }}
done
copying static files... ... done
copying extra files... done
dumping search index in English (code: en)... done
dumping object inventory... done
build succeeded, 2 warnings.

The HTML pages are in _build/html.

The documentation is built using make. The Makefile checks that the necessary components sphinx, click, sphinx-click_and sphinx-rtd-theme are installed.

You can view the result in your favorite browser:

(.venv) > open _build/html/index.html

The filepath is made evident from the last output line above. This is what the result looks like (html):

_images/img2-1.png

Increment the version string:

(.venv) > micc version -M -t [ERROR] Not a project directory (/Users/etijskens/software/dev/workspace/tmp/ET-dot/docs). (.venv) > cd .. (.venv) > micc version -M -t [INFO] (ET-dot)> micc version (0.3.0) -> (1.0.0) [INFO] Creating git tag v1.0.0 for project ET-dot [INFO] Done.

Note that we first got an error because we are still in the docs directory, and not in the project root directory.

Tutorial 3: Adding Python components

3.1 Adding a Python module

Just as one can add binary extension modules to a package, one can add python modules.

> micc add foo --py

[INFO]           [ Adding python module foo.py to project ET-dot.
[INFO]               - python source in    ET-doc/et_doc/foo.py.
[INFO]               - Python test code in ET-doc/tests/test_foo.py.
[INFO]           ] done.

This adds a Python sub-module to the package, and a test script. The documentation for the sub-module is extracted from doc-strings of the functions and classes in the sub-module.

As with micc create the default structure is that of a simple module, i.e. ET-doc/et_doc/foo.py. If you want a package you can add the --package flag.

Testing the module

When adding a module foo, Micc automacally adds a test script for the new module: tests/test_foo,py. In this file you add tests for module foo.

Documenting the module

When adding a module foo, Micc automatically adds documentation entries in API.rst. Calling micc docs will automatically extract documentation from the doc-strings in your new module.

3.2 Adding a Python Command Line Interface

Command Line Interfaces are Python scripts that you want to be installed as executable programs when a user installs your package.

As an example, assume that we need quite often to read two arrays from file and compute their dot product, and that we want to execute this operation as:

> dot-files file1 file2
dot(file1,file2) = 123.456
>

Micc supports two kinds of CLIs based on click, a very practical tool for building Python CLIs. The first one is for CLIs that execute a single task, the second one for a command with sub-commands, like git or micc itself. The single task case default, so we can create it like:

> micc app dot-files
[INFO]           [ Adding CLI dot-files without sub-commands to project ET-dot.
[INFO]               - Python source file ET-dot/et_dot/cli_dot-files.py.
[INFO]               - Python test code   ET-dot/tests/test_cli_dot-files.py.
[INFO]           ] done.

For a CLI with sub-commands one should add the flag --sub-commands.

The source code ET-dot/et_dot/cli_dot_files.py should be modified as:

# -*- coding: utf-8 -*-
"""Command line interface dot-files (no sub-commands)."""

import sys

import click
import numpy as np

from et_dot.dotf import dotf

@click.command()
@click.argument('file1')
@click.argument('file2')
@click.option('-v', '--verbosity', count=True
             , help="The verbosity of the CLI."
             , default=1
             )
def main(file1,file2,verbosity):
    """Command line interface dot-files.

    A 'hello' world CLI example.
    """
    a = np.genfromtxt(file1, dtype=np.float64, delimiter=',')
    b = np.genfromtxt(file2, dtype=np.float64, delimiter=',')
    ab = dotf(a,b)
    if verbosity>1:
        print(f"dot-files({file1},{file2}) = {ab}")
    else:
        print(ab)

if __name__ == "__main__":
    sys.exit(main())  # pragma: no cover

Here’s how to use it from the command line (without installing):

> source .venv/bin/activate
(.venv) > cat file1.txt
1,2,3,4,5
> cat file2.txt
2,2,2,2,2
(.venv) > python et_dot/cli_dot_files.py file1.txt file2.txt
30.0
(.venv) > python et_dot/cli_dot_files.py file1.txt file2.txt -vv
dot-files(file1.txt,file2.txt) = 30.0
Testing the application

When you add an a application like dot-files Micc automatically adds a test script tests/test_cli_dot_files.py where you can add your tests. Testing CLIs is a bit more complex than testing modules, but Click provides some tools for Testing click applications. Here is the test code:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

from click.testing import CliRunner

from et_dot.cli_dot_files import main

def test_main():
    runner = CliRunner()
    result = runner.invoke(main, ['file1.txt','file2.txt'])
    print(result.output)
    ab = float(result.output[0:-1])
    assert ab==30.0

Finally, we run pytest:

> pytest
================================= test session starts =================================
platform darwin -- Python 3.7.4, pytest-4.6.5, py-1.8.0, pluggy-0.13.0
rootdir: /Users/etijskens/software/dev/workspace/ET-dot
collected 10 items

tests/test_cli_dot-files.py .                                                   [ 10%]
tests/test_cpp_dotc.py .                                                        [ 20%]
tests/test_et_dot.py .......                                                    [ 90%]
tests/test_f90_dotf.py .                                                       [100%]

================================== 10 passed in 0.33 seconds ==========================
Documenting an application

When adding a CLI, Micc automatically adds documentation entries for it. in APPS.rst. Calling micc docs will automatically extract documentation from the doc-strings of the command and the :param ...: of the click.argument decorators in these doc-strings, and from the help parameters of the click.option decorators.

Tutorial 4: Version control and version management

4.1 Git support

When you create a new project, Micc immediately provides a local git repository for you and commits the initial files Micc set up for you. If you have a github account you can register it in the preferences file ~/.micc/micc.json, using the github_username entry:

{
...
, "github_username"  : {"default":"etijskens"
                       ,"text"   :"your github username"
                       }
...
}

Micc cannot create a remote github repository for you, but if you registered your github username in the preferences file, it will add a remote origin at https://github.com/etijskens/<your_project_name>/, and try to push the files to the github repo. If you have created the remote repository before you create the project, the new project will be immediately pushed onto the remote origin. Otherwise, you get a warning that the remote repository does not yet exist. You can create the remote repository whenever you like and push your work onto the remote repository using the git CLI.

4.2 Version management

Version numbers are practical, even for a small software project used only by yourself. For larger projects, certainly when other users start using them, they become indispensable. When giving version numbers to a project, we highly recommend to follow the guidelines Semantic Versioning 2.0. Such a version number consists of Major.minor.patch. According to semantic versioning you should increment the:

  • Major version when you make incompatible API changes,
  • minor version when you add functionality in a backwards compatible manner, and
  • patch version when you make backwards compatible bug fixes.

Micc sets a version number of 0.0.0 when it creates a project, and you can bump the version number at any time with the micc version command.

> micc info
Project ET-dot located at /Users/etijskens/software/dev/workspace/ET-dot
  package: et_dot
  version: 0.0.0
  structure: et_dot/__init__.py (Python package)
  contents:
    application cli_dot_files.py
    C++ module  cpp_dotc/dotc.cpp
    f90 module  f90_dotf/dotf.f90

To bump the patch component:

> micc version
Project (ET-dot version (0.0.0)
> micc version --patch
[INFO]           bumping version (0.0.0) -> (0.0.1)

Again, with the short version of --patch and verbose this time, :

> micc -vv version -p
[DEBUG] start = 2019-10-16 13:18:16.995416
[INFO]           bumping version (0.0.1) -> (0.0.2)
[DEBUG]          . Updated (/Users/etijskens/software/dev/workspace/ET-dot/pyproject.toml)
[DEBUG]          . Updated (/Users/etijskens/software/dev/workspace/ET-dot/et_dot/__init__.py)
[DEBUG] stop  = 2019-10-16 13:18:17.261962
[DEBUG] spent = 0:00:00.266546

Here, you can see that micc updated the version number in ET-dot/pyproject.toml and ET-dot/et_dot/__init__.py.

To bump the minor component use the --minor or -m flag:

> micc version -m
[INFO]           bumping version (0.0.2) -> (0.1.0)

As you can see the patch component is reset to 0.

To bump the major component use the --major or -M flag:

> micc version -M
[INFO]           bumping version (0.1.0) -> (1.0.0)

As you can see the minor component (as well as the patch component) is reset to 0.

The version number has a --tag flag that creates a git tag (see https://git-scm.com/book/en/v2/Git-Basics-Tagging) and trys

> micc -vv version -p --tag
[DEBUG] start = 2019-10-16 13:37:25.026161
[INFO]           bumping version (1.0.1) -> (1.0.2)
[DEBUG]          . Updated (/Users/etijskens/software/dev/workspace/ET-dot/pyproject.toml)
[DEBUG]          . Updated (/Users/etijskens/software/dev/workspace/ET-dot/et_dot/__init__.py)
[INFO]           Creating git tag v1.0.2 for project ET-dot
[DEBUG]          Running 'git tag -a v1.0.2 -m "tag version 1.0.2"'
[DEBUG]
[DEBUG]          Pushing tag v1.0.2 for project ET-dot
[DEBUG]          Running 'git push origin v1.0.2'
[DEBUG]          remote: Repository not found.
                   fatal: repository 'https://github.com/etijskens/ET-dot/' not found
[INFO]           Done.
[DEBUG] stop  = 2019-10-16 13:37:26.101959
[DEBUG] spent = 0:00:01.075798

If you created a remote github repository for your project and registered your github username in the preferences file, the tag is pushed to the remote origin.

Tutorial 5 - Publishing your code

Publishing your code is an easy way to make your code available to other users.

5.1 Publishing to the Python Package Index

For this we rely on poetry. If you do not have a PyPI account, create one and run this command in your project directory, e.g. et-foo:

Note

It is crucial that your project name is not already taken. For this reason, we recommend that

  1. before you create a project that you might want to publish, you check wether your project name is not already taken.
  2. immediately after your project is created, you publish it, as to reserve the name forever.

Now everyone can install the package in his current Python environment as:

> pip install et-foo

5.2 Publishing packages with binary extension modules

Packages with binary extension modules are published in exactly the same way, that is, as a Python-only project. When you pip install a Micc project the package directory will end up in the site-packages directory of the Python environment in which you install. The source code directories of the binary extensions modules are installed with the package, but without the binary extensions themselves. These must be compiled locally. Fortunately that happens automatically, at least if the binary extension were added to the package by Micc. When Micc adds a binary extension to a project, two thing happen:

  • a dependency on micc-build is added to the project, and
  • in the top-level module <package_name>/__init__.py a try-except block is added that tries to import the binary extension and in case of failure (ModuleNotFoundError) will attempt to build it using the machinery provided by micc-build. This will usually succeed, provided the necessary compilers are available.

As an example, let us create a project foo with a binary extension module bar written in C++

> micc -p Foo create
> cd auto-build
> micc add bar --cpp

This creates this Foo/foo/__init__.py:

# -*- coding: utf-8 -*-
"""
Package foo
===========

Top-level package for foo.
"""

__version__ = 0.0.0

try:
    import foo.bar
except ModuleNotFoundError as e:
    # Try to build this binary extension:
    from pathlib import Path
    import click
    from et_micc_build.cli_micc_build import auto_build_binary_extension
    msg = auto_build_binary_extension(Path(__file__).parent, 'bar')
    if not msg:
        import foo.bar
    else:
        click.secho(msg, fg='bright_red')

def hello(who='world'):
    ...

If the first import foo.bar fails, the except block imports the method auto_build_binary_extension() and executes it arguments the path to the package directory :file`Foo/foo` and the name of the binary extension module bar. If the build succeeds, the msg string is empty and foo.bar is imported at last, otherwise the error message msg is printed.

5.3 Providing auto_build_binary_extension() with custom build parameters

The auto-build above will normally use the default build options, corresponding to -O3, which optimizes for speed. As the auto_build_binary_extension() method is called automatically, we have not many options to set build options. The auto_build_binary_extension() method will look for the existence of a file Foo/foo/cpp_bar/build_options.<platform>.json, where <platform> is Darwin, on MACOSX, Linux` on Linux and ``Windows on Windows. If it exists, it should contain a dict with the build options to use.

Note

The build options files are OS specific:

  • On MacOSX : build_options.Darwin.json
  • On Linux : build_options.Linux.json
  • On Windows : build_options.Windows.json
f90 module build option specifications

All options available to the f2py command line application can be entered in the build file specification. Pure flags, like e.g. --noopt, which are present or not, but have no value, are entered in the dictionary with value None. Below are some examples of much used f2py flags.

import json
from pathlib import Path
import platform

f2py = {
    '--f90exec' : 'f90 compiler executable'
    '--f90flags': 'f90 compiler flags'
    '--opt'     : 'f90 compiler optimization flags'
    '--arch'    : 'f90 compiler architecture specific compiler flags'
    '--noopt'   : None # neglect '--opt' contents
    '--noarch'  : None # neglect '--arch' contents
    '--debug'   : None # compile with debugging information
}
module_srcdir_path = Path(project_path) / package_name / f"f2py_{module_name}"
with (module_srcdir_path / f"build_options.{platform.system()}.json").open('w) as f:
    json.dump(f2py, f)

Note

The Python dictionary f2py is written to file in .json format, which is human readable. You can also construct it with an editor.

Cpp module build option specifications

For cpp binary extension modules the build tool is CMake. Here, the entries of the build options dict consist of any CMake variable and its desired value.

import json
from pathlib import Path
import platform

cmake = {
    'CMAKE_BUILD_TYPE' : 'RELEASE',
    ...
}
module_srcdir_path = Path(project_path) / package_name / f"cpp_{module_name}"
with (module_srcdir_path / f"build_options.{platform.system()}.json").open('w) as f:
    json.dump(cmake, f)

5.4 Publishing your documentation on readthedocs.org

Publishing your documentation to Readthedocs relieves the users of your code from having to build documentation themselves. Making it happen is very easy. First, make sure the git repository of your code is published on Github.Second, create a Readthedocs account if you do not already have one. Then, go to your Readthedocs page, go to your projects and hit import project. Fill in the fields and every time you push commits to Github its documentation will be rebuild automatically and published.

Note

Sphinx must be able to import your project in order to extract the documentation. If your codes depend on Python modules other than the standard library, this will fail and the documentation will not be built. You can add the necessary dependencies to <your-project>/docs/requirements.txt.

This document walks you through the differences of using micc to manage your python+C++/Fortran projects on the VSC clusters. If you are already familiar with the use of HPC environments, the only relevant part of this tutorial is section 6.2 Using Poetry on the cluster. Otherwise, it is recommended to go through the entire tutorial.

Note

This tutorial uses the Leibniz cluster of the University of Antwerp for the examples. The principles pertain, however, to all VSC clusters, and most probably also to other clusters using a module system for exposing its software stack.

Tutorial 6 - Using micc projects on the VSC clusters

Most differences between using your local machine or a cluster stem from the fact that a cluster, typically, uses a module system for making software available to the user on a login node (interactive mode) and to a compute node (batch mode). In addition, the cluster uses a scheduler that determines when your compute jobs are executed.

The tools we need are, typically:

  • a modern Python version. As Python 2.7 is officially discontinued, that would probably be 3.7 or later.
  • common Python packages for computing, like Numpy, scipy, matplotlib, …
  • compilers for C++ and/or Fortran, for compiling binary extensions.
  • CMake, as the build system for C++ binary extensions.
  • git, for version control, if we are developing code on the cluster.

6.1 Using modules

The cluster’s operating system exposes some of these tools, but, they lag many versions behind and, although very reliable, they are not fit for high performance computing purposes.

As an example consider the GCC C++ compiler g++. Here is the g++ version exposed by the operating system (at the day of writing: August 2020):

> which g++
/usr/bin/g++
> g++ --version
g++ (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39)
...

Still at the day of writing, the latest GCC version is 6 major versions ahead of that: 10.2! The OS g++ is very reliable for building operating system components but it is not suited for building C++ binary extensions that must squeeze the last bit of performance out of the cluster’s hardware. Obviously, this old g++ can impossibly be aware of modern hardware, and, consequentially, cannot generate code that exploit all the modern hardware features introduced for improving performance of scientific computations.

Similarly, the OS Python is 2.7.5, whereas 3.9 is almost released, and 2.7.x isn’t even officially supported anymore.

So as a rule of thumb:

Never use the tools provided by the operating system.

As the preinstalled modules are built by VSC specialists for optimal performance on the cluster hardware, this rule should be extended as:

Do not install your own tools (unless they are not performance critical, or, you are a specialist yourself).

If you need some software package, a library, a Python module, or, whatever, which is not available as a cluster module, especially, if it is performance critical, contact your local VSC team and they will build and install it for you (and all other users).

The VSC Team has installed many software packages ready to be used for high performance computing. In fact, they are built using the modern compilers and with optimal performance in mind. Contrary to what you are used to on your personal computer where installed software packages are immediately accessible, on the cluster an extra step must be taken to make installed packages accessible.

If you are unsure whether a command is provided by the operating system or not, use the linux which command:

> which g++
/usr/bin/g++

Typically, commands of the operating system are found in /usr/bin and should usually not be used for high performance computing. Commands provided by some cluster module are, typically, found in /apps/<vsc-site>/....

Note

The which cmd command shows the path to the first cmd on the PATH environment variable.

So, how do we get access to the commands we are supposed to use?

HPC packages are installed as modules and to make them accessible, they must be loaded. Loading a module means that your operating system environment is modified such that it can find the software’s executables, that is, the directories containing its executables are added to the PATH variable. In additio, other environment variables adjusted or added to make everything work smoothly.

E.g., to use a recent version of git we load the git module:

> module load git
> which git
/apps/antwerpen/broadwell/centos7/git/2.13.3/bin/git
> git --version
git version 2.13.3

Before we loaded git, the which command would have shown:

> which git
/usr/bin/git
> git --version
git version 1.8.3.1

A much older version, indeed.

You can search for modules containing e.g. the word gcc (case insensitive):

> module spider gcc
...

If you know the package name, you can list the available versions with module av. Here are the available Python versions (the command is case insensitive):

> module av python/

which on Leibniz returns:

----------------------------------------------------- /apps/antwerpen/modules/centos7/software-broadwell/2019b -----------------------------------------------------
   Biopython/1.74-GCCcore-8.3.0-IntelPython3-2019b    Biopython/1.74-intel-2019b-Python-3.7.4 (D)    Python/3.7.4-intel-2019b (D)
   Biopython/1.74-intel-2019b-Python-2.7.16           Python/2.7.16-intel-2019b

----------------------------------------------------- /apps/antwerpen/modules/centos7/software-broadwell/2018b -----------------------------------------------------
   Python/2.7.15-intel-2018b    Python/3.6.8-intel-2018b    Python/3.7.0-intel-2018b    Python/3.7.1-intel-2018b

----------------------------------------------------- /apps/antwerpen/modules/centos7/software-broadwell/2018a -----------------------------------------------------
   Python/2.7.14-intel-2018a    Python/3.6.4-intel-2018a    Python/3.6.6-intel-2018a

----------------------------------------------------- /apps/antwerpen/modules/centos7/software-broadwell/2017a -----------------------------------------------------
   Biopython/1.68-intel-2017a-Python-2.7.13    pbs_python/4.6.0-intel-2017a-Python-2.7.13    Python/3.6.1-intel-2017a
   Biopython/1.68-intel-2017a-Python-3.6.1     Python/2.7.13-intel-2017a

  Where:
   D:  Default Module

If you need software that is not listed, request it at hpc@uantwerpen.be

Please, mind the last line. If you need something that is not pre-installed, request it at mailto:hpc@antwerpen.be

You can unload a module:

> module unload git
> which git
/usr/bin/git

The current git command is that of the OS again.

You can unload all modules:

> module purge

To learn the details about the VSC clusters’ module system, consult Using the module system.

6.2 Using Poetry on the cluster

6.2.1 Installing Poetry

Poetry is, sofar, not available as a cluster module. You must install it yourself. The installation method recommended by the poetry_documentation is also applicable on the cluster (even when the system Python version is still 2.7.x):

> module purge
> curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python

The module purge command ensures that the system Python is used for the Poetry installation. This allows you to have a single poetry installation that works for all Python versions that you might want to use. So, internally, poetry commands use the system Python which is always available, but your projects can use any Python version that is made avaible by loading a cluster module, or, that you installed yourself.

6.2.2 Using pre-installed Python packages

As the cluster modules generally come with pre-installed Python packages which are built to achieve optimal performance in a HPC environment, e.g. Numpy, Scipy, …) we do not want poetry install to reinstall these packages in your project’s virtual environment. That would lead to suboptimal performance, and waste disk space. Fortunately, there is a way to tell Poetry that it must use pre-installed Python packages:

> mkdir -p ~/.cache/pypoetry/virtualenvs/.venv
> echo 'include-system-site-packages = true' > ~/.cache/pypoetry/virtualenvs/.venv/pyvenv.cfg'

(If the name of your project’s virtual environment is not .venv, replace it with the name of your project’s virtual environment).

6.3 Using micc on the cluster

First, we make sure to load a modern Python version for our project. The VSC clusters have many Python versions available, and come in different flavours, depending on the toolchain that was used to build them. On Leibniz, e.g., we would load:

> module load leibniz/2019b     # unleashed all modules compiled with the intel-2019b toolchain
> module load Python/3.7.4-intel-2019b

This module comes with a number of pre-installed Python package wich you can see using;;

> ll $(dirname which python)/../lib/python3.7/site-packages

The above Python/3.7.4-intel-2019b is a good choice. Usually, loading a Python module will automatically also make the C++ and Fortran compilers available that were used to compile that Python module. They are, obviously, needed for building binary extensions from C++ and Fortran code.

in addition, Micc relies on a number of other software package to do its work.

  • Git, our preferred version control system. The system git is a bit old, hence:

    > module purge
    > git --version
    git version 1.8.3.1 # this is the system git
    > module load git
    > git --version
    git version 2.13.3
    
  • For building binary extensions from C++ we need CMake, hence:

    > cmake --version
    cmake version 2.8.12.2 # this is the system CMake
    > module load leibniz/2019b
    > module load CMake
    > cmake --version
    cmake version 3.11.1
    
  • For building binary extensions from Fortran, we need f2py, which is made available from Numpy. Hence, we need to load a cluster Python module with Numpy pre-installed (please check `7.2.2 Using pre-installed Python packages`_ for this). The above loaded Python version is ok for that.

Authors

Development

Sofar, Micc is a one-man effort:

Feel free to submit issues bugs/feature requests/questions at

We do our best to respond in a timely manner.

History

This section summarizes all my steps on the way to a working micc, including dead-ends.

https://sdss-python-template.readthedocs.io/en/latest/

Using poetry on the cluster

Installed poetry with pip install poetry. poetry install` for ``ET-dot seems to update e.g.. Numpy (it is also found in .venv’s site-packages), but import numpy nevertheless uses the preinstalled python, which is almost what we want, except for Numpy to appear in the site-packages, which wastes disk space.

I must still be verified how we can install poetry just once for several python versions. Suspectedly, the system python (2.7.5 on Leibniz) is not sufficient. I jugded this wrong. I installed poetry on Leibniz, using the recommended method:

> curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python

with the system python, and i was able to run poetry install on a project.

v0.10.25

It’s been a while that this file has been updated …

v0.9.6 (2019-11-21)

  • Keeping version number between micc and micc-build consistent is error-prone. Each has to know the other’s version number because dependencies must be set. Et-micc-build depends on et-micc. So et-micc-build/pyproject.toml must know the version number of et-micc. On the other hand, when et-micc adds a binary extension to a project it must add the dependency on et-micc-build to that project’s pyproject.toml file, and thus et-micc must know the version number of et-micc-build. bumpversion has a lot more in petto than we used so far.

    if they have the same version number it becomes much simpler.

v0.9.5 (2019-11-19)

v0.8.0 (2019-11-11)

MICC exists on PyPI * project renamed to et-micc

v0.7.5 (2019-11-11)

v0.7.4 (2019-10-23)

  • Github issue #3
  • created branch issue_3
  • implemented a workaround in micc dev-install.

v0.7.3 (2019-10-22)

  • micc –save/load: add .json extension if omitted, and test for filepath.
  • tutorial
  • micc build –clean: removes also binary extension, not only build directory

v0.7.2 (2019-10-22)

  • issue #6 micc build –save/load <build options file>.json
  • issue #5 micc build –clean option
  • fixed setting compile options in cpp modules.

v0.7.1 (2019-10-21)

Issue2 providing f2py command line arguments through CMake seems like a lot of details while in fact there is no reason to use CMake to call f2py in the first place. Instead we try to use the numpy.f2py module. works well. Which there was a python interface to cmake…

v0.7.0 (2019-10-18)

we are writing our issues on github now, and we are working inside a seprate git branch for every issue. merging implies a version bump. * more intuitive micc subcommands

v0.6.8 (2019-10-17)

  • fixed bug in micc app documentation generation
  • wip on tutorial
  • bug fix in micc app
  • if github_username is empty do not add remote origin
  • add --license option to micc create
  • simplified test template.
  • bug fix --clear-log

v0.6.6 (2019-10-10)

  • refactored logging tools into logging.py
  • fixed indentation with loglevel
  • wip tutorial-1
  • micc create now creates a module structure by default. In this way the interface of micc create and micc module is the same.
  • factored out template helpers in expand.py
  • some code cleanup
  • documentation
  • log warnings if executing commands fails (rather than debug messages) so the user sees them always

v0.6.5 (2019-10-08)

  • indentation in logs
  • log time of a command
  • bug fix in template expansion when adding a second application

v0.6.4 (2019-10-05)

  • improve documentation templates
  • replaced --structure option with --module flag and --package flag

v0.6.3 (2019-10-04)

  • started tutorial writing. They will help the user to learn how to use micc, and help me in improving the functionality of micc.

Using micc myself encouraged me to

  • command line interfaces are best non-interactive., i.e. they better not ask extra information. If information is missing, they better quit with a message. So,

    • don’t ask for project name,
    • don’t ask for short description

    This approach allows the cli to be used in scripts, and it eases the testing.

    Note that this principle should make us revisit the micc.json template parameter approach. see also issue #12, which was closed, but not entirely satisfactory.

  • modify micc create as to create a project in an empty directory.

  • Fixed: return codes from commands are now propagated as exit code of the application.

  • Fixed: issue #12.

  • Fixed: issue #23.

v0.6.2 (2019-10-02)

  • Sofar, we added the dependencies manually, by editing pyproject.toml. We should normally add dependencies through poetry. That seems to work well, though you need to be connected to the internet for this. There was one issue: sphinx-click adds pytest="^4.4.2" as a dependency. This prevents the use of pytest ^5.2.0. Conclusion: do not add pytest as a dependency. In addition sphinx-click depends on sphinx, but seems to be request with sphinx-2.1.2 rather than sphinx-2.2.0. Of course it slips in with sphinx-click
  • While trying to run the micc tests with poetry 1.0.0b1 installed, the tests fail because this poetry version depends on cleo-7.5.0 rather than cleo-6.8.0. Conclusion: we do not want our code to depend on poetry, or any of its dependencies except for its command line interface because it is too volatile.
  • fixed issue #15
  • fixed issue #16
  • fixed issue #17
  • fixed issue #18
  • fixed issue #19
  • fixed issue #20
  • fixed issue #21

micc now no longer needs poetry and can run in a conda environment.

  • poetry install is problematic and should not be used in a conda environment.
  • poetry build seems to work well and the Makefile commands for installing and reinstalling locally work well too.

v0.6.1 (2019-09-27)

  • modified dependencies of micc. Created empty conda environment:

    > make reinstall
    > pytest tests
    

All tests succeed.

v0.6.0 (2019-09-27)

  • looks as most things are working smoothly now…, so we move to 0.6.0
  • code review by codacy

v0.5.17 (2019-09-26)

modified assertions

v0.5.16 (2019-09-25)

  • adding a micc info command that lists information on the project
  • improved the documentation.

v0.5.15 (2019-09-25)

It would be nice to have an automatic way of generating documentation from our click apps. Howver, the output of micc --help is not in .rst format… Here is an interesting stackoverflow issue for documenting click commands with sphinx (see the answer on sphinx-click works like a charm

v0.5.14 (2019-09-25)

  • Improved logging, added function micc.utils.get_micc_logger() to get the logger from everywhere, as well as a function micc.utils.create_logger() to create a new logger.
  • Added a function for executing and logging a list of commands.
  • Added a micc command for documentation generation
  • improved testing
  • improved output of micc version
  • refactoring: the project_path is now a global_option accepted by all subcommands

this changes the commands:

> micc create foo

becomes

> micc -p path/to/foo create

v0.5.13 (2019-09-23)

improving tests. I am still rather uncomfortable with testing cli’s. Most of the code is tested by manual inspection. This unlucky patch number is for improving this situation.

  • step 1: study the click documentation on testing. Interesting topic: File system isolation, runs a scenario in an empty directory, e.g. /private/var/folders/rt/7h5lk6c955db20y1rzf1rjz00000gn/T/tmpa12gc_p9 Good idea but the location of that directory is a bit hard to trace. (Note, that a temporary directory inside micc doesn’t work, because micc refuses to create a project inside another project. a flag –allow-nesting is added (feature

refactoring: still not happy with the --simple argument for micc

  • how does python refer to its project structure? module.py vs module/__init__.py

    • module.py is called a module
    • module/__init__.py is called a package

    Thus, our modules may be modules or sub-level packages, and our package is actually the top-level package, which may contain other lower-level packages, as well as modules.

fixed issue #13 [feature] os.path -> pathlib fixed issue #12 common items in micc.json files fixed issue #14 [feature] –allow-nesting flag for creating projects implemented #6 [feature] decomposition fixed issue #10 micc files are part of the template

v0.5.12 (2019-09-19)

Add logging to micc commands as we tried out in micc build

v0.5.11 (2019-09-18)

#2 [feature] simple python project add --simple flag to micc create to create a simple (=unnested) python module <package_name>.py instead of the nested <package_name/>__init__.py After a bit of thinking the most practical approach would be to make a copy of the cookiecutter template micc-package, rename directory {{cookiecutter.package_name}} to src and rename file {{cookiecutter.package_name}}/__init__.py to src/{{cookiecutter.module_name.py}}. The big advantage of that is that the directory structure is almost completely the same. Other approaches would need to relocate the docs folder, the tests folder and a bunch of other files. This will most certainly limit the number of changes that is necessary.

However, this breaks the test``tests/test_<module_name>.py`` as the module is in src and it cannot be imported without adding src to sys.path. To cure the problem we can add a test to see if the project is simple and add src to sys.path if it is. The less code, the better, i think, thus i replace src/<module_name>.py with <module_name>.py and remove the src directory.

This works fine. The next consideration that comes to my mind is that as a simple python package grows, it might be useful to be able to convert it to a general python package. To do that, the missing .rst files must be added, with their references to cookiecutter variables correctly replaced. Currently, however, the cookiecutter variables aren’t stored in the project. Moreover, cookiecutter doesn’t support filters to use part of a template. so we must copy that part into a seperate cookiecutter template. Hence, this is the plan:

  • we remove the docs directory from the micc-package template and from the micc-package-simple template.
  • we create a micc-package-simple-docs template with the .rst files from micc-package-simple
  • and a micc-package-general-docs template with the .rst files that must be added or overwritten from micc-package

Moreover, we split of the common parts of micc-package and micc-package-simple into

  • micc-package, common parts
  • micc-package-general, general package specific parts
  • micc-package-simple, simple package specific parts

we also change all names of cookiecutter template from micc-whatever> to template-<whatever> Thus, when a general package is created, we must use the templates (in this order)

  • template-package
  • template-package-general
  • template-package-simple-docs
  • template-package-general-docs

and when a simple package is created, we must use the templates (in this order)

  • template-package
  • template-package-simple
  • template-package-simple-docs

Then converting a simple python project to a general python project is simple:

  • add directory <package_name> to the project,
  • copy <packaga_name>.py to <packaga_name>/__init__.py.
  • add the template template-package-general-docs

v0.5.10 (2019-09-09)

Fixed issue #4:

  • added command micc build [-s|--soft-link], builds all binary projects (C++ and f2py)
  • builders removed from Makefile

Fixing issue #4 raised a problem: the environment of subprocess is not the same as that in which micc script is run. In particular which python returns /usr/bin/python. As a consequence pybind11 is not found. after using subproces.run( ... , env=my_env, ...) things are working (with the caveat that subprocess picks up the environment in which it is running. If run in eclipse, e.g. it will pick up the environment from which eclipse was started. As a consequence it may pick up the system python and not find pybind11.)

v0.5.9 (2019-09-06)

fixed issue #1

v0.5.8 (2019-09-06)

fixed issue #3 for cpp modules

v0.5.7 (2019-09-06)

fixed issue #3 for f2py modules

v0.5.6 (2019-07-19)

in v0.5.5, we could do:

> micc module bar
> micc module bar --f2py
> micc module bar --cpp

without error, in arbitrary order. However, the f2py module and the C++ module will generate the same bar.<something>.so file in the package folder, which is obviously wrong. In addition the .so files will be in conflict with bar.py.

This version fixes this problem by verifying that the module name is not already in use. If it is, an exception is raised.

v0.5.5 (2019-07-19)

  • add cookiecutter template for C++ modules.

We need:

  • a C++ compiler (icpc or g++). On my mac i’d like to use GCC installed with brew install gcc, or the g++ that comes with xcode. On the cluster I prefer Intel’s icpc.
  • a Boost.Python library for the python in the environment. On the cluster there are modules with Boost.Python such as the Intel Python distribution. Building Boost with python isn’t particulary easy, so, let’s see what Anaconda cloud provides.
    • boost-cpp v1.70, which has 1.76M downloads, so it is certainly worth trying. bad luck: does not provide Boost.Python. A pity, I always like to have the latest Boost version.
    • libboost: boost v1.67, not the most recent but certainly ok, unfortunately also no Boost.Python.
    • py-boost <https://anaconda.org/anaconda/py-boost>: boost v1.67 and ``libboost_python.dylib` and libboost_python37.dylib. We used this before.
  • The include files numpy_boost.hpp and numpy_boost_python.hpp are copied to the <package_name>/numpy_boost directory. At some time they can be moved to a fixed place in the micc distribution. The user will probably never touch these files anyway.

For compiling the module we have several options:

  • a build script, as for the f2pymodules,
  • a separate makefile, or
  • suitable targets in the project makefile.

For building we use the make system from et/make. For the time being, we copy all makefiles into the <package_name>/make directory. In the end they could be distributed with micc, together with numpy_boost.hpp` and numpy_boost_python.hpp. Note that the locations of the make and numpy_boost directories are hardcoded as ./make and ./numpy_boost so that make must be run in the package directory, for the time being. That went fine, however, the issue with py-boost MACOS is still there. See et/numpy_boost/test_numpy_boost/macosx_anaconda_py-boost.md. The problem could be fixed by reverting to python 3.6 using the conda environment created as:

> conda create -n py-boost python=3.7 py-boost
> conda install py-boost python python-3.6.4 -c conda-forge

as suggested here (https://github.com/pybind/pybind11/issues/1579). Make sure to rerun the last line after installing a package This fix doesn’t seem to work anymore (2019.07.12). Python-3.6.4 is no longer available… Hence we look further:

[At this point i checked that the test code works on Leibniz using the Intel Python distribution: it does! So the problem is the same as before, but the solution does no longer work.]

I gave up… Started compiling Boost.Python myself. Download the boost version you like, and uncompress

> cd ~/software/boost_1_70_0
> ./bootstrap.sh --with-python=`which python` --with-libraries="python"
> ./b2 toolset=darwin stage

Got errors like:

...failed darwin.compile.c++ bin.v2/libs/python/build/darwin-4.2.1/release/python-3.6/threading-multi/visibility-hidden/tuple.o...
darwin.compile.c++ bin.v2/libs/python/build/darwin-4.2.1/release/python-3.6/threading-multi/visibility-hidden/str.o
In file included from libs/python/src/str.cpp:4:
In file included from ./boost/python/str.hpp:8:
In file included from ./boost/python/detail/prefix.hpp:13:
./boost/python/detail/wrap_python.hpp:57:11: fatal error: 'pyconfig.h' file not found
# include <pyconfig.h>
          ^~~~~~~~~~~~
1 error generated.

    "g++"   -fvisibility-inlines-hidden -fPIC -m64 -O3 -Wall -fvisibility=hidden -dynamic -gdwarf-2 -fexceptions -Wno-inline
    -DBOOST_ALL_NO_LIB=1 -DBOOST_PYTHON_SOURCE -DNDEBUG  -I"." -I"/Users/etijskens/miniconda3/envs/pp2017/include/python3.6m"
    -c -o "bin.v2/libs/python/build/darwin-4.2.1/release/python-3.6/threading-multi/visibility-hidden/str.o"
    "libs/python/src/str.cpp"

The -I option specifies the wrong Python location! had to remove the Python configuration in ~/user-config.jam.

Still got “pyconfig.h not found errors”:

...failed darwin.compile.c++ bin.v2/libs/python/build/darwin-4.2.1/release/python-3.7/threading-multi/visibility-hidden/list.o...
darwin.compile.c++ bin.v2/libs/python/build/darwin-4.2.1/release/python-3.7/threading-multi/visibility-hidden/long.o
In file included from libs/python/src/long.cpp:5:
In file included from ./boost/python/long.hpp:8:
In file included from ./boost/python/detail/prefix.hpp:13:
./boost/python/detail/wrap_python.hpp:57:11: fatal error: 'pyconfig.h' file not found
# include <pyconfig.h>
          ^~~~~~~~~~~~
1 error generated.

    "g++"   -fvisibility-inlines-hidden -fPIC -m64 -O3 -Wall -fvisibility=hidden -dynamic -gdwarf-2
    -fexceptions -Wno-inline  -DBOOST_ALL_NO_LIB=1 -DBOOST_PYTHON_SOURCE -DNDEBUG  -I"."
   -I"/Users/etijskens/miniconda3/envs/ws2/include/python3.7" -c -o
   "bin.v2/libs/python/build/darwin-4.2.1/release/python-3.7/threading-multi/visibility-hidden/long.o"
   "libs/python/src/long.cpp"

The -I option specifies /Users/etijskens/miniconda3/envs/ws2/include/python3.7 whereas on my mac it is called /Users/etijskens/miniconda3/envs/ws2/include/python3.7m. a soft link python3.7 which links to python3.7m solves the problem. Alternatively, edit the project-config.jam file and replace the using python : line with (on a single line, I guess):

using python : 3.7 : /Users/etijskens/miniconda3/envs/ws2/bin/python \
             : /Users/etijskens/miniconda3/envs/ws2/include/python3.7m \
             : /Users/etijskens/miniconda3/envs/ws2/lib ;

Now Boost.Python builds fine. The libraries are in stage/lib:

> ll stage/lib
total 13952
-rw-r--r--  1 etijskens  staff   935152 Jul 12 12:23 libboost_numpy37.a
-rwxr-xr-x  1 etijskens  staff    73852 Jul 12 12:23 libboost_numpy37.dylib*
-rw-r--r--  1 etijskens  staff  5750432 Jul 12 12:23 libboost_python37.a
-rwxr-xr-x  1 etijskens  staff   374980 Jul 12 12:23 libboost_python37.dylib*

In /Users/etijskens/miniconda3/envs/ws2/lib create soft links to both libraries:

> cd /Users/etijskens/miniconda3/envs/ws2/lib
> ln -s path/to/boost_1_70_0/stage/lib/libboost_python37.dylib
> ln -s path/to/boost_1_70_0/stage/lib/libboost_numpy37.dylib

In /Users/etijskens/miniconda3/envs/ws2/include, if there is no boost subdirectory, create soft links to libraries:

> cd /Users/etijskens/miniconda3/envs/ws2/include
> ln -s path/to/boost_1_70_0/boost

If there is already a boost subdirectory:

> cd /Users/etijskens/miniconda3/envs/ws2/include/boost
> ln -s path/to/boost_1_70_0/boost/python.hpp
> ln -s path/to/boost_1_70_0/boost/python

… on Linux this works fine. On my Mac however I keep running into problems.

While googling for a solution, I came across pybind11. These sections in the readme makes me particularly curious:

  • pybind11 is a lightweight header-only library that exposes C++ types in Python and vice versa, mainly to create Python bindings of existing C++ code.
  • Think of this library as a tiny self-contained version of Boost.Python with everything stripped away that isn’t relevant for binding generation. Without comments, the core header files only require ~4K lines of code and depend on Python (2.7 or 3.x, or PyPy2.7 >= 5.7) and the C++ standard library. This compact implementation was possible thanks to some of the new C++11 language features (specifically: tuples, lambda functions and variadic templates). Since its creation, this library has grown beyond Boost.Python in many ways, leading to dramatically simpler binding code in many common situations.
  • Find some examples of its use in Using pybind11

Works as a charm. Comes with a cross-platform CMake build system that works out of the box. Must put a soft link to the pybind11 repository in the project directory for the add_subdirectory(pybind11) statement to work. (in section “6.3.3 find_package vs. add_subdirectory” of the pybind11 manual, it is stated that this can be overcome with find_package(pybind11 REQUIRED) (this finds the pybind11 installed in the the current python environment out of the box - which is a good reason to conda|pip install pybind11, rather than check it out from github). That releaves me from maintaining the make stuff i wrote. (With a little bit of CMake code for the f2py modules everything becomes much better streamlined and the et/make system is no longer needed.

Had some trouble making it work on Leibniz. CMake’s FindPythonInterp relies on python-config to pick up the location current python executable. Unfortunately, the Python environment I was using only defines python3-config, and not python-config. That soft link still had to be added.

Must now figure out how to deal with numpy arrays… There is some interesting information in section “11.6 Eigen” of the pybind11 manual, allowing numpy arrays to be passed by reference and work with Eigen matrices on the C++ side. Using Eigen instead of Boost.MultiArray has the additional advantage that there is some dense and sparse linear algebra routines are also available.

Set Boost_INCLUDE_DIR and Eigen3_INCLUDE_DIR in pybind11/CMakeLists.txt. Now Eigen tests are run as well and no failures observed.

The fact that cmake does so well, make me wonder if i shouldn’t use cmake for the f2py modules as well instead of a shell script that will fail on windows. Scikit-build contains a FindF2py.cmake The tool, however, lets f2py find out the Fortran and C compiler it should use, and does not add any optimisation flags.

There is a slight catch in using CMake: the filename CMakeLists.txt is fixed. As “fixed”” as in “impossible to change” (see this). That requires us to have a separate directory for each cpp module we want to add. We did not have that restriction with f2py modules. For reasons of consistency, we might change that. Module directory names should then, probably, be set as f2py_module and cpp_module rather than the other way around, so modules of the same type appear consecutivel in a directory listing. Sofar, Python modules added through micc module my_module appear as my_module.py in the package directory.

CMake version is working. F2py autoselects the compilers (Fortran and C) We further accomplished

  • get rid of the soft link to pybind11 by referring to its installation directory in CMakeLists.txt. Fixed! Finds the pybind11 installed in the current Python environment out of the box! (see pybind11 documentation Build systems/Building with CMake/find_package vs add_subdirectory)). done.
  • get rid of the directory cmake_f2py_tools by referring to its location inside micc in CMakeLists.txt (we do not want endless copies of these, nor soft links) How can i install the micc CMake files so i can find them in the same way find_package(micc CONFIG REQUIRED). Playing the trick of pybind11 to install files into /path/to/my_conda_environment/share/pybind11 seems to be non-trivial (https://github.com/pybind/pybind11/issues/1628. Let’s see if we can work around this. We can indeed easily work around. The micc.utils module now has a function path_to_cmake_tools() that returns the path to the cmake_tools directory (using __file__). This path is added to the template parameters (before they are exported to cookiecutter.json). Then Cookiecutter knows the path and can insert it in the CMakeLists.txt. Simple, and no loose ends. done.

The approach to expose the micc/cmake_tools directory to the CMakeLists.txt of an fpy module through utils.path_to_cmake_tools() is a bit static as it hardcodes the path of the micc version that was use to create the module. If it changes, because that micc version is moved, or because development continues in another python environment, the build will fail. Instead we rely on the CMake variable ${PYTHON_SITE_PACKAGES} and append /micc/cmake_tools to it.

Now add the cookiecutter templates for C++ modules

Tested f2py and cpp modules and their test scripts.

v0.5.4 (2019-07-10)

  • add cookiecutter template for fortran modules with f2py. We need:
    • f2py, comes with Numpy
    • a fortran compiler
    • a C compiler
    • what can be provided out of the box by conda?
    • support for this on on the clusters

I followed this advice: f2py-running-fortran-code-in-python and installed gcc from homebrew brew install gcc. Inside the Conda environment I created soft links to gcc, g++ and gfortran.

There is an issue with Fortran arguments of type real*16, which become long_double instead of long double in the <modulename>module.c wrapper file. The issue is circumvented by editing that file and running f2py_local.sh a second time. The issue occurred in gcc 7.4.0, 8.3.0 and 9.1.0. Switching to gcc provided by XCode does not help either. However, adding -Dlong_double="long double" to the f2py command line options solves the problem nicely. :)

I, typically, had different bash scripts for running f2py, one for building locally and one for each cluster. It would be nice if a single script would do and pickup the right compiler from the environment where it is run, as well as set the correct compiler options. There may be different f2py modules, so there will be a different script for every f2py module: f2py_<module_name>.sh. Preferentially <module_name> ends with f90. The module name appears also inside the script. The script looks for a ifort, and if absent for gfortran in the environment. It uses gcc for compiling the C-wrappers and for f2py. If one of the components is missing, the script exits with a non- zero error code and an error message. The makefile can call:

for s in f2py_*.sh; do ./${s}; done

Do we want a fortran module or not? the fortran module complicates stuff, as it appears as a namespace inside the python module:

# a) with a fortran module:
# import the python module (built from compute_f90_a.f90) which lives
# in the proj_f2py package:
import proj_f2py.compute_f90_a as python_module_a
# create an alias for the fortran module inside that python module, which
# is called 'f90_module'. The fortran module  behaves as any other member
# in the python module.
f90 = python_module.f90_module

# b) without a fortran module:
# import the python module (built from compute_f90_b.f90)
# this doesn not have a fortran module inside.
import proj_f2py.compute_f90_b as python_module_b

Documenting fortran modules with sphinx is problematic. There exists a sphinx extension sphinx-fortran, but this works presumably only with sphinx versions older than 1.8, and it is not avtively maintained/developed, which is a pity imho. As an alternative we include a file <project_name>/<package_name>/<module_f2py>.rst which has a suitable template for adding the documentation. As we actually want to document a python module (built from Fortran code with f2py), we expect the user to enter documentation for the wrapper functions, not for the pure Fortran functions. That goes in the <project_name>/<package_name>/<module_f2py>.f90 file but is not exposed in the project documentation.

v0.5.3 (2019-07-09)

  • check for overwriting files (we must specify overwrite_if_exists for cookiecutter because it will already report an error if just the directories exist. Adding files to existing directories is not supported out of the box.) The more components one can add, the higher the chance that there is going to be a name clash and files are going to be overwritten. We do not want this to happen. We propose that micc should fail when files are overwritten, and that the command be run again with a --force option.
    • Maybe, we can monkey patch this problem in cookiecutter. No success.
    • Create a tree of directories and files to be created and check against the pre-existing tree. Seems complicated.
    • Create the tree to be added in a temporary dir which does not yet exist, and than check for collisions. That seems feasible.

v0.5.2 (2019-07-09)

  • add option --f2py to micc module ...

v0.5.1 (2019-07-09)

  • micc create ... must write a .gitignore file and other configuration files. Addition of modules, apps do not change these.
  • Cookiecutter template micc-module-f2py added, no code to use it yet

v0.5.0 (2019-07-04)

  • Fixed poetry issue #1182

v0.4.0 (2019-06-11)

  • First functional working version with
    • micc create
    • micc app
    • micc module
    • micc version
    • micc tag

v0.2.5 (2019-06-11)

  • git support
    • git init` in ``micc create
    • micc tag

v0.2.4 (2019-06-11)

  • Makefile improvements:
    • documentation
    • tests
    • install/uninstall
    • install-dev/uninstall-dev

v0.2.3 (2019-06-11)

  • Using pyproject.toml, instead of the flawed setup.py
  • Proper local install and uninstall. By Local we mean: not installing from PyPI. we had that in et/backbone using pip. But pip uses setup.py which we want to avoid. There is not pyproject.toml file sofar…

Moving away from setup.py and going down the pyproject.toml road, we can choose between poetry and flit.

Although, I am having some trouble with reusing some poetry code, i have the impression that it is better developed, and has a more active community (more watchters, downloads, commits, …)

A pyproject.toml was added (used poetry init to generate pyproject.toml). First issue is how to automatically transfer the version number to our python project. This is a good post about that.

  • using pkg_resources implies a dependence on setuptools = no go
  • using tomlkit for reading the pyproject.toml file implies that the pyproject.toml file must be included in the distribution of the package. Since pyproject.toml is complete unnnecessary for the functioning of the module, we’d rather not do that. So, we agree with copying the version string from pyproject.toms to the python package (=duplicating). This is basically the same strategy as used by bumpversion.
  • the command poetry version … allows to modify the version string in pyproject.toml. In principle we can recycle that code. However, we could not get it to work properly (see issue https://github.com/sdispater/poetry/issues/1182). This could probably be circumvented by creating my own fork of poetry.
    • it is simple to write a hack around this (read the file into a string, replace the version line, and write it back. this preserves the formatting but in the unlikely case that there is another version string in some toml table it will be incorrect.
    • the toml package is much simpler than tomlkit, does not cause these problems, but it does not preserve the formatting of the file.
  • poetry itself uses a separate __version__.py file in the package, containing nothin but __version__ = "M.m.p". This is imported in __init__.py as from .__version__ import __version__. This makes transferring the version from pyproject.toml to __version__.py easy.

Let’s first check if we can achieve a proper local install with poetry … Install a package:

> poetry build
> pip install dist/<package>-<version>-py3-none-any.whl

Uninstall:

> pip uninstall <package>

This seems to do the trick:

> pip install -e <project_dir>

Install a dev package use cmd:

> pip install --editable <project_dir>

Uninstall:

> rm -r $(find . -name '*.egg-info')

But take care, uninstalling like this:

> pip uninstall <package>

removed the source files. See this post.

v0.1.21 (2019-06-11)

first working version

v0.0.0 (2019-06-06)

Start of development.

Open Issues

#25

#24 [feature] add indentation to logging

> micc -p foo create

[INFO] Creating project (foo):
       Python package (foo): structure = (foo/foo/__init__.py)
[INFO] Creating git repository
[INFO] ... done.

[INFO] ... done.

should look like:

> micc -p foo create

[INFO] Creating project (foo):
       Python package (foo): structure = (foo/foo/__init__.py)
       ...
[INFO]     Creating git repository
           ...
[INFO]     done.
[INFO] done.

#22 [issue] building cpp module fails when pybind11 is pip-installed

(at least on a conda environment) it works however when it is conda-installed.

DEBUG    micc-build-cpp_mod4.log:utils.py:295  (stderr)
                                              CMake Error at CMakeLists.txt:14 (find_package):
                                                Could not find a package configuration file provided by "pybind11" with any
                                                of the following names:

                                                  pybind11Config.cmake
                                                  pybind11-config.cmake

                                                Add the installation prefix of "pybind11" to CMAKE_PREFIX_PATH or set
                                                "pybind11_DIR" to a directory containing one of the above files.  If
                                                "pybind11" provides a separate development package or SDK, be sure it has
                                                been installed.

also building an f2py module fails for the same reason. conda installs the missing files in ~/miniconda3/envs/micc/share/cmake/pybind11

#5 [feature] packaging and deployment, and the use of poetry in general

The Python Packaging User Guide has a section Packaging binary extensions, with two interesting subsections Publishing binary extensions and Cross-platform wheel generation with scikit-build. Thus, it seems we are on the right track as scikit-build is based on CMake, which we are using too for building our f2py and C++ binary modules. Unfortunately, the section Publishing binary extensions is broken, as it mentions “For interim guidance on this topic, see the discussion in this issue.” There, two interesting links are found:

Note that the setup.py approach can become rather convoluted: https://github.com/zeromq/pyzmq/blob/master/setup.py.

The scikit-build approach is mentioned .

These approaches request the user that installs your package to also build the binary Python extensions (f2py, C++), which may be challenging on Windows.

The Python Packaging User Guide suggests Appveyor , a CI solution, for providing windows support: https://packaging.python.org/guides/supporting-windows-using-appveyor/.

how does poetry deal with binary extensions?

issue #8 cookiecutter.json files

These files are written in the template directories of the micc installation. If micc happens to be installed in a location where the user has no write access, micc will not work.

Closed Issues

#1 [bug] FileExistsError in micc module

in v0.5.6:

The commands:

> micc module --f2py <module_name>
> micc module --cpp <module_name>

generate:

FileExistsError: [Errno 17] File exists: '<package_name>/tests'

if the flag --overwrite is not specified. This behavior is incorrect. Only if existing files are overwritten an exception must be raised, not when a new file is added to an existing directory.

#3 [feature] add useful example code to templates

Put more useful example code in

  • cpp_{{cookiecutter.module_name}}/{{cookiecutter.module_name}}.cpp -> added in v0.5.7.
  • f2py_{{cookiecutter.module_name}}/{{cookiecutter.module_name}}.f90

as well as in the corresponding test files.

v0.5.10

#4 [bug] build commands for f2py and cpp modules

<package_name>/Makefile contains wrong builder for f2py modules and no builder for cpp modules.

Running CMake build from the ccd ..ommand line:

> cd <package_name>/cpp_<module_name>
> mkdir build_
> cd build_
> cmake CMAKE_BUILD_TYPE=RELEASE ..
> make

Then, either copy the .so file to <package_name>, or make a softlink. A simple package (feature #2) should have simple documentation, and complete documentation when converted to a full blown package.

feature #11 add log files to micc build

controlling the output with verbose is not sufficient. If one of the build commands fails we want to print all output for building that module. that’s hard to control with verbose.

issue #9 prohibit creation of a micc project under another project

This implies asserting that none of the parent directories of the output directory is a project directory (in micc_create_simple and micc_create_general

issue #7 cookiecutter.json files are temporary

While workin on issue #2 I realized that these are in fact temporary files, which do neither belong in the template directories (although cookiecutter requires them). It is better to remove these files when cookiecutter is done.

v0.5.11

#2 [feature] simple python project

add --simple flag to micc create to create a simple (=unnested) python module <package_name>.py instead of the nested <package_name/>__init__.py a simple package should be convertible to a normal package

v0.5.13

issue #12 common items in micc.json files

While workin on issue #2 I realized that there are now several micc.json` files with common items which are in fact copies. we need either a single ``micc.json or a way of isolating the common parts in a single file. Fixed by itself. If there are multiple templates, every new template adds parameters to the original.

#13 [feature] os.path -> pathlib

more obvious manipulation of file paths

#14 [feature] add flag for nesting a project inside another project

mainly for running tests.

#6 [feature] decomposition

maybe it is usefull to limit the number of files in the cookiecutter templates. For now even the simples project contains 11 .rst files. For a beginner that may be too much to grasp. Maybe it is ] usefull to start with a README.rst only and have a micc doc [options] command that adds documentation topics one at a time:

> micc doc --authors
> micc doc --changelog|-c # or
> micc doc --history|-h
> micc doc --api|-a
> micc doc --installation|-i

this is perhaps useful, but rather more complicated. E.g if we first create a package with several modules (python, f2py, cpp) and then start to add documentation. This is a more complicated situation and one in which errors will be easily made, and more difficult to maintain.

issue #10 micc files are part of the template

So they better live there.

v0.6.2

#16 [issue] poetry 1.0.0b1 uses different cleo than 0.12.17

this break our code for retrieving the current version number.

#15 [issue] using poetry bumping the version in pyproject.toml

currently we do this by using poetry’s source code (import). As poetry recommends a singly installation of poetry system-wide, this adds an extra dependency (i.e. poetry itself) on top of the single installation. There is no way of guaranteeing that both versions are the same. Ideally, we would rely on only the system version of poetry.

#19 [issue] avoiding poetry

Poetry is currently used for bumping versions only. As poetry does not play well with conda, (see issue #18) relying on poetry (and therefore adding it as a dependency of micc) is a ticking time bomb because when poetry is there users will use it, even if it is documented that it does not work. So i propose to use bump(2)version for bumping versions by default. This will always work well with Anaconda Python versions.

interesting links:

#21 [issue] micc create

raises exception (cannot create project inside another project) when run from a project directory with a relative path. E.g.:

~/path/to/micc/ > micc create

fails when given the path ../foo although this would be created in ~/path/to/soep.

#20 [issue] install dependencies in current conda environment

As poetry build works, and pip install dist/<wheel> too, even in a conda python environment, this is no longer an issue,.

#18 [issue] two tools for reading/writing toml files

Currently, we are using both tomlkit and toml for reading and writing toml files. Better stick to one.

#17 [issue] poetry 1.0.0b1 does not play well with conda

Poetry made me a virtual environment for micc, based on miniconda3’s active python version (which was 3.7.3). However, it did not pickup the correct python standard library (used 3.6.whatever instead), obviously a nightmare. Thus, if we want to use poetry, we must use a non-conda Python, or if we want to use conda python versions, we must refrain from poetry. solved: only poetry install does not work well with a conda python environment, poetry build does fine

#23 [issue] remove interactivity from micc app and micc module

behavior is now steered with flags. default behavior is abort if pre-existing files would be overwritten.

TODO

  • check for a the presence of a global CMake or add it as a dev dependency in pyproject.toml.

  • allow Fortran binary extensions to be build and installed with CMake_, just as for C++ binary extensions. This provides a more uniform interface.

  • pybind11 v2.6.1 depends on a more recent version of CMake (v3.4).

  • update tutorial 7

  • move LINKS.rst to a separate git repo (it changes too often)

  • add a remove command for removing a component of a micc project see https://github.com/etijskens/et-micc/issues/32

  • add a rename command for renaming a component of a micc project or the project itself. see https://github.com/etijskens/et-micc/issues/29

  • Put makefile targets into micc commands and remove micc/makefile? This makes a more uniform interface. Use subprocess or lrcmd for this?

  • Fortran compiler options (while using f2py): is -O3 enough?

  • allow for multiple Fortran/C++ source files?

  • how to add external projects for f2py and C++ modules (include files, libraries)?

    • this was fixed for C++ by using the CMake framework
  • check cppimport

  • undo micc app ...?

  • undo micc module ...?

  • check if project_name exists already on readthedocs or pypi. If not abort and print a message that suggests to use a different name, or to create the project anyway by using --force.

  • remove dependency on toml in favor of tomlkit which comes with poetry. (now that we are fixed poetry issue #1182)

  • regression tests

  • Reflect about “do we really need poetry? (see below)

Poetry considerations

  • What are we using poetry for?
  • Do we really need it?
  • Maybe we should wait a bit for poetry to mature, before we start building our micc project around it.
  • Maybe we should decouple micc and poetry?
  • Maybe we should still use setup.py rather than poetry because it is well established?

There is a poetry issue on poetry+anaconda python Any plans to make it work with anaconda python?. Locally, we are completely relying on Anaconda Python. Consequentially, I am not completely feeling comfortable with poetry`_ - but it is very actively developed.

Anaconda Python used to be very convenient, but maybe the standard python+pip+virtualenv is good enough today? One advantage anaconda python still has is that its numpy well aligned numpy arrays which is in favor of vectorization.

So far we use poetry for:

  • building wheels (which are used for installing and publishing): poetry build, typically inside the Makefile. However, I haven’t figured out how to go with e.g. f2py modules and C++ modules.

  • poetry.console.commands.VersionCommand for updating version strings

  • we are not using
    • poetry install to create a virtual environment
    • poetry run ... to run code in that virtual environment
  • We could use poetry install to create a virtual environment and point to it in eclipse/pydev so that we will always run our code in that environment

  • tests should probably be run as:

    > poetry run pytest tests/test*
    

Development plan

What do we actually need?

  • a standardized development environment

    • click : for command line interfaces
    • sphinx : for documentation
    • pytest : for running tests
    • flake8 : for assuring PEP 8 compatibility
    • cookiecutter : if we want sth based on existing templates
    • tox ?
    • poetry?
  • a standardized way of creating projects for packages and apps.

  • automation of project management tasks, e.g. CI, publishing, …

This package was inspired by Cookiecutter.

Inspiration for the project templates came from:

Interesting posts:

  • Here is a particularly readable and concise text about packaging Current State of Python Packaging - 2019 (Pycoder’s weekly #372 june 11, by Stefano Borini). The bottom line is: use poetry. After reading (just part) of the documentation I concluded that poetry solves a lot project management issues in an elegant way. I am likely to become addicted :).
  • version numbers: adhere to Semantic Versioning

Think big, start small…

Maybe it is a good idea to get everything going locally + github, and add features such as:

  • readthedocs,
  • publishing to pypi,
  • travis,
  • pyup,
  • …,

incrementally.

Indices and tables