Abstracts

Boaz Leskes – Elasticsearch: Deep dive into analytics using Aggregations

Elasticsearch 1.0 features a completely new way of doing analytics called Aggregations. Similar to it's facets predecessor, Aggregations provides a way to gain insights into pattern within you data. However, where facets offer one insights at a time, Aggregations are composable, providing for a much more powerful analysis tools. As all Elasticsearch features, they work in near real time. This talk will introduce you into Aggregations step-by-step and shows some use-cases, how easy it is to extract useful information from your data.

view the slides

Brian Brazil – No C-QL (Or how I learned to stop worrying, and love eventual consistency)

Traditional relational databases focus on ACID, providing strong semantics that require careful synchronisation between actors that limit scalability. NoSQL Column Stores such as Cassandra, Riak and Dynamo offer another way, by eschewing strong consistency you can meet your application's needs while also increasing scalability and reliability. This talk will cover how and where to use eventual consistency.

view the slides

Eric Redmond – Distributed Search on Riak 2.0

Riak excels at one type of query: key puts and key gets. But the world demands more from a database. Since Basho isn't primarily a search company, we decided to leverage the power of Solr for Riak 2.0. This is a walkthrough of what new features we added, how works, and why you'd want to use it. Also, of course, live demos.

view the slides

Giovanni Lanzani – SQL vs NoSQL databases for data driven applications

For data to be the fuel of the 21th century, and for data science to live up to its promise as adriver of innovation, their application should not be confined to dashboards and static analyses.Instead they should be the driver of real applications that support the organisations that own orgenerates the data. Most of these applications are web-based and require real-time access to thedata. However, many Big Data analyses and tools are inherently batch-driven and not well suited forsecure, real-time and performance-critical connections with applications. Trade-offs become ofteninevitable, especially when mixing multiple tools and data sources.In this talk we will describe our journey to build a data driven application at a large Dutchfinancial institution. We will dive into the issues we faced, our considerations and the technicalchoices we made in order to perform data analyses but also drive a web-based, real-timeapplications. We considered and used Impala, Hbase, and MongoDB, but also conventional SQL databasessuch as MySQL and PostgreSQL. Important aspects in our journey were, among others, the handling ofgeographical data, the access to hundreds of millions of records as well as the real time analysisof millions or data points.

view the slides

Jacqui Taylor – Keynote: How 24/7 has become 9 to 5

The introduction of Cloud Computing allowed organisation to move towards the ubiquitous computing model which companies like Netflix and Facebook take for granted. Big data technologies are now available to produce the same style of results for data. Organisations who understand that they must be able to create data ecosystems which mix structured, semi-structured and unstructured data are adopting the latest noSQL technologies. One major gain with these technologies is the ability for an organisation to move from a 24/7 operational model, to a 9 to 5 style service. In this presentation, Jacqui Taylor, a leading Web Scientist and CEO of FlyingBinary will discuss how: ● new approaches for data are required to if an organisation is to profit from our connected web world ● new revenue streams are being created by companies who can turn data into insight ● how adopting a cloud powered hardened Open Source stack transforms an organisation’s operational capability.

view the slides

Lucian Precup – Back to the Future : SQL 92 for Elasticsearch ?

What if we would try to make Elasticsearch SQL 92 compliant (https://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt) ? This wouldn't serve that much nowadays, you would say. Well, we actually tried to do the exercise and we have some very interesting conclusions that we want to share in this talk. While we take Elasticsearch as an example for this "side by side", the issues we are addressing also apply to Nosql in general.

view the slides

Martin Schönert – Domain Driven Design and NoSQL Databases

Many software development projects fail to deliver the desired functionality. This is because users, product owners and developers fail to communicate the real requirements – some "obvious" requirements are never really talked about and there are misunderstandings about others. Domain Driven Design – introduced by Eric Evans – addresses this problem by introducing a ubiquitous language into the software development process. Unfortunately there is one group left speaking their very own language – the developers using a relational database. Luckily there is a remedy. In this talk I show how the concepts of Domain Driven Design fit almost perfectly to the objects in NoSQL databases – especially the Multi Model NoSQL databases.

view the slides

Max Neunhöffer – Joins and aggregations in a distributed NoSQL DB

NoSQL databases are renowned for their good horizontal scalability and sharding is these days an essential feature for every self-respecting DB. However, most systems choose to offer less features with respect to joins and aggregations in queries than traditional relational DBMS do. In this talk I report about the joys and pains of implementing a powerful query language with joins and aggregations in a distributed database with a flexible data model. I will cover (distributed) execution plans, query optimisation and data locality issues.

view the slides

Michael Hausenblas – Memento: how data storage hardware constraints impact database software architectures

When in the late 70s and early 80s relational databases began its rise, many of the design considerations revolved around outsmarting the constraints of hard disk drives (HDD). Using an index to minimise I/O are one such an example. Then, Hadoop-based systems turned the whole equation upside down: leveraging the throughput of many parallel spindles to and avoid costly seek times as much as possible. With SSD, Flash Storage and in-memory solutions becoming not only more popular but also more mainstream, we're now left with a rather complicated andextensive set of data storage hardware configurations.In this talk we will discuss in-memory solutions—from Hazelcast over Spark's Resilient Distributed Dataset (RDD) to newly proposed HDFS extensions such as Discardable Distributed Memory (DDM)—along dimensions such as availability, business continuity and TCO.

view the slides

Niall Milton – Complex Analytics with Apache Cassandra & Friends

Cassandra's log based storage engine lends itself to storing high resolution time series data at high velocity. Other technologies such as Spark and Storm allow real-time calculations to be run on very large data sets. This presentation will explore some of the more complex data analysis use cases you may encounter such as logistic regression, clustering and aggregating time based data on a rolling basis.

view the slides

Richard Shaw – Blazing Fast SQL querying on Hadoop with Apache Drill

I'd like to wow the audience with a demo of Apache Drill. The low latency, feature packed ANSI SQL querying tool for Hadoop, based on Google's Dremel technology.

view the slides

Sebastian Cohnen – Getting Started With Time Series Data

At StormForger we use several NoSQL systems to handle a large range of various types of data. Since we are measuring performance of HTTP based APIs we especially deal with large quantities time series data in form of performance metrics and sensor data. In this talk I'd like to present how we model and handle our time series data with InfluxDB and how easy it is to get started. Dealing with hundreds of thousands of data points per second, we e.g. use InfluxDB's continuous queries feature to filter and pre-process events on-the-fly, while they occur. We also build our aggregation, analysis and charting system directly on top of InfluxDB.

view the slides

Simon Elliston Ball – When to NoSQL and When to Know SQL

With NoSQL, NewSQL and plain old SQL, there are so many tools around it’s not always clear which is the right one for the job.This is a look at a series of NoSQL technologies, comparing them against traditional SQL technology. I’ll compare real use cases and show how they are solved with both NoSQL options, and traditional SQL servers, and then see who wins. We’ll look at some code and architecture examples that fit a variety of NoSQL techniques, and some where SQL is a better answer. We’ll see some big data problems, little data problems, and a bunch of new and old database technologies to find whatever it takes to solve the problem.By the end you’ll hopefully know more NoSQL, and maybe even have a few new tricks with SQL, and what’s more how to choose the right tool for the job.

view the slides

Tim Vincent – Cassandra + Spark = Awesome

This talk will discuss how Cassandra and Spark can work. This is a technical discussion that will introduce the attendees to the basic principals on Cassandra and Spark, why they work well together and examples use cases.

view the slides

Uwe Friedrichsen – Extreme availability and self-healing data with CRDTs

There are several challenges in the NoSQL world. Especially if you have very high availability requirements you have to accept temporal inconsistencies which you need to resolve explicitly. This is usually a tough job which requires implementing case by case business logic or even bothering the users to decide about the correct state of your data.Wouldn't it be great if we could solve this conflict resolution and data reconciliation process in a generic way at a pure technical level?That's exactly what CRDTs (Conflict-free Replicated Data Types) are about. CRDTs are data structures that are guaranteed to converge to a desired state while enabling extreme availability of the datastore.In this session you will learn what CRDTs are, how to design them, what you can do with them, what their limitations and tradeoffs are – of course garnished with lots of tips and tricks. Get ready to push the availability of your datastore to the max!

view the slides