Data Side of Life

pytest and fixtures

Alice
0

Unit tests are important part of the software development. In python those can be approached from different angles and using different libraries. Here I want to show everything I wish I knew before I started working with pytest library, fixtures and parametrization. Fixtures Fixtures are pieces of code that are used to set up a […]

py-test abstract classes

In certain scenarios comes a need to create abstract classes. Those can be usefull as base classes, to provide a layout for different implementions of inheriting concrete versions. Class is called abstract when it contains at least one abstract method. Such a method simply lacks implementation, just provides a declaration. Due to that, abstract classes […]

Alice
Tags: poetry, Python, torch
0

Poetry python library allows, in an easy manner, to track package dependencies within a project. It creates a snapshot of the environment in a form of a lockfile, containing packages names with installed versions, to ensure ease of repeatable installations. That is super helpful for maintaining same setups on development and production environments or for […]

Alice
Tags: CI, devops, Docker, gitlab
0

Recently gitlab platform becomes more and more popular. Not only does it provide the git version control, but also has embedded lots of useful devops related functionality. One of those is the option to build CI/CD pipelines, which is extremely useful for many projects. It allows you to automate tests and deployments. There are plenty […]

Alice
Tags: Docker, keras, lstm, R, RStudio, tensorfow
0

Having a stable environment is super useful for development. Creating one within a Docker container in many cases is a good idea. Here I show how to create a container for development with R (and RStudio instance), with installed keras and tensorflow packages. This can be particularly useful if you wish to build a forecasting […]

Targets R package for managing workflows

Workflows help us to keep a clear structure of the flow we are building, allow for easier steps traceability and simplified maintenance. They are especially useful when dealing with data science work, where heavy computations take time to run. In R world first popular package to deal with pipelines was drake. It allows not only […]

Memory leakage while plotting in a loop

Issue Memory leakage while generating python matplotlib plots in a loop on MacOS system. I was using python 3.9 and MacOS Catalina. I was trying to generate lots of plots for my analysis. Idea was to create them in a loop: render plot save the output iterate further Simple example of the case:

import matplotlib.pyplot as plt

for i in range(10000):
    fig = plt.figure(figsize=(25, 25))
    plt.plot([1,2,3])
    plt.savefig(f'temp.png')
    plt.close(fig)

import matplotlib.pyplot as plt

for i in range(10000):

fig = plt.figure(figsize=(25, 25))

plt.plot([1,2,3])

plt.savefig(f'temp.png')

plt.close(fig)

[…]

Spark & R – SparkR vs sparklyr

R enthusiasts can benefit from Spark using one of two available libraries – SparkR or sparklyr. They both differ in usage structure and slightly in available functionality. SparkR is an official Spark library, while sparklyr is created by the RStudio community. Due to the fact that currently Python is favourite language for Data Scientists using […]

Alice
Tags: Spark
0

Recently Apache Spark 3.1.1 was released. Let’s take a look into some of the new features provided within Spark version 3. HIGHLIGHTS Adaptive query execution That means allowing Spark to change the execution plan during runtime, when run statistics are being updated. In other words after some processing steps are already done and stats […]

Alice
Tags: Mllib, Spark
0

When working with Spark MLlib library you may notice that there are different features available in Python and R APIs. In Python, in addition to models, we can benefit from Transformers, which represent feature transformations that can be done before the modelling. Transformers are also available in sparklyr, but are clearly missing in SparkR. Also […]

…

Data stories and processing

pytest and fixtures

py-test abstract classes

pytorch libs poetry installation on different operating systems

gitlab CI pipeline

Docker with R and keras

Targets R package for managing workflows

Memory leakage while plotting in a loop

Spark & R – SparkR vs sparklyr

Spark 3 highlights

SparkR MLlib