Data Science

esentri at Polish View on Machine Learning 2018

Filip Stepniak
Filip Stepniak

This post shall serve as a short recap, dedicated to the ones who missed out on the conference Polish view on Machine Learning. Since Machine Learning (ML) belongs to my daily business at esentri and Warsaw is the city of my Alma Mater I was very excited about getting a grasp of the Polish Machine Learning scene.

Academic Scenery

Polish view on Machine Learning (PLinML) is a conference organized mostly by students at Warsaw University Faculty of Mathematics, Informatics and Mechanics (MIM), which explains two of this year’s intriguing facts: it was held over the weekend and the talks took place in auditoriums. The academic scenery definitely sustained the scientific flavor of the event, or the other way around if you like. Nevertheless, the conference was numerously represented by business people too.

Scientific research was the main focus of the event, hence I will start with a short summary of two most interesting research-oriented talks. Obviously, due to the spatiotemporal limitations I wasn’t able to visit each of the three sessions at the time. During the pre-choice of the sessions, the topic that caught my attention mostly was unsupervised learning in computer vision. With the massive increase of computational power and available data without annotations, applying models to unlabeled domains seemed very interesting, therefore I decided to put my emphasis there.

Unsupervised Learning in Computer Vision

The first interesting talk held by Piotr Bojanowski (AI Researcher @Facebook) was a presentation of the results published in a recent paper [1]. The authors designed convolutional neural networks (CNN) without labels. The algorithm alternates between clustering the image descriptions (using k-means) and updating the weights of the CNN. At the end they adapted not only end-to-end training of the CNN but also run it at scale. The findings have a very positive impact on exploring bigger and more diverse datasets without annotations. As a result, when the model is trained on large dataset, it achieves higher performance than the previous state-of-the-art on standard datasets (ImageNet, YFCC100M). As typical for unsupervised learning, the approach requires little assumption about the inputs and doesn’t depend much on domain specific knowledge.

The second talk that attracted my attention was held by Adam Kosiorek (PhD @Oxford University). The idea of the paper [2] is to represent the consistency of time and space in terms of inductive bias and to increase the efficiency of learning through the model structure. Attend, Infer & Repeat (AIR) is an example of such a structured probabilistic model that relies on deep learning [3]. Trained without supervision, AIR is able to decompose a visual scene into components and successfully define the location and appearance of each object. Despite its powerful application, AIR has some flaws. It struggles with overlapping and partially occluded objects. As a solution to those limitations the authors introduced a sequential version of AIR. They extended AIR into a spatiotemporal model and trained it on unlabeled sequences of dynamic objects. Consequently, they were able to track objects over time as well as extrapolate the frames into the future. The model was tested on both synthetic and real data examples (moving MNIST and static CCTV cameras) with positive results. The algorithms learned to detect and track moving digits/pedestrians even when they were very close to each other. In case of the moving MNIST dataset the model outperformed the previous state-of-the-art approach.

Emerging Business Potential

On the other hand, the conference also covered the emerging business potential in Poland. The presentation of ByteDance by Ming Li (Director @ByteDance Poland) served as a very good example. ByteDance is a Chinese unicorn (startup company valued at over $1 billion) that applies artificial intelligence (AI) in their social media app – TikTok. TikTok is a pretty fascinating app that employs Machine Learning to modify the content of short videos. With over 500 Mio. global users, the app seems to work very well on the Asian market. ByteDance just founded a ML-research unit in Warsaw and started recruiting – which is probably the main reason why they appeared at PLinML.

The last talk that I was very fond of was ‘Learning to rank at scale’, presented by Ireneusz Gawlik & Tomasz Bartczak (R&D @Allegro). is a Polish equivalent of eBay with 40 Mio. searches daily. Ireneusz and Tomasz presented a detailed deployment of ML that provides users with relevant search results at real scale. They dove into different learning-to-rank approaches and followed with a presentation of an exhaustive system architecture. My takeaway: building intelligent software requires a balanced combination of complex mathematics as well as a pragmatic approach.

The Rise of Unsupervised Learning

To sum up, one of the main lessons learned during the conference was the rise of unsupervised learning. This domain of Machine Learning has been already studied at the end of 1990s, also in the computer vision context. However, the variety and scale of applications have never been so impressive. The speakers proved during the conference that deep learning isn’t necessary limited to labeled and perfectly structured datasets, instead the unsupervised or semi-supervised algorithms can be successfully trained and applied in different domains with sparse annotations.

Two-sided Community

Another finding that should be underlined is a well-established Machine Learning research in Poland, which takes action both from the inside through very concentrated universities in Warsaw, Poznan and Wroclaw, as well as on the outside through young polish professionals doing research abroad. Conferences like PLinML definitely encourage further growth and cooperation of the two-sided ML community in Poland by promoting adoption of successful mechanisms and solutions from abroad in domestic academia and industry.



[1] Deep Clustering for Unsupervised Learning of Visual Features by Caron et al.

[2] Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects by Kosiorek et al.

[3] Attend, Infer, Repeat: Fast Scene Understanding with Generative Models by Eslami et al.