Experience of attending YottaByte X — ThoughtWorks data conference

Umair Mohammad
5 min readMay 11, 2020

TLDR;

I attended Yottabyte X — Edition 1- Gurgaon and it was very insightful and fun. Decks

Longer version :

Last month I got a chance to attend the Yottabyte X — Edition 1- Gurgaon. Just thought of sharing my experience.

The event started with a warm welcome keynote by Kamal Kishore, he set the right tone for the event by giving a sneak peek into what will be unfolding throughout the event. Audience was very engaging and I could feel them getting engaged and asking a lot of questions right from the keynote. I would like to appreciate Sumedha Verma for keeping the whole event right on schedule. Find deck here

Then the first talk of the day was “Leading indicators of success through north star KPIs” given by Smitha Ganesh. The talk was very interesting, she gave a decent overview of how we’re enabling one of our clients to become data driven. Two liners would be, we make hypotheses then prioritise that based on value, cost, etc. Validate it based on the data we have. Based on the result, fine tune the existing process and restart the same process. Find deck here

Smitha was right on time, after her talk we took a small tea break. I met 2 college students. One of them was in 2 year and doing an internship in a startup related to providing internship opportunities to students. They work on React & NodeJS. We also discussed a few things related to common coding hygiene. For me it was inspiring to see a 2nd year student working on these tech stacks, which I never knew at his age 😛

After the break we started with “Scaling Patterns for Real-time Streaming App” by Shakti Garg & Lalit Pandey. It was a nice talk about managing the whole data flow, right from the data source (like an IOT device) to a real time consumer (like reporting dashboard). Traditionally we would be having a database which will store and serve data for our application. From this main database something like a cron job will take data, process it and dump (ETL — Extract Transform Load) in another database. This new database will serve consumers, one of them can be a reporting dashboard. Issue with this whole system is high latency, batch processing instead of real time processing, scalability, etc. This talk suggested the solution of introducing a streaming platform. That platform will be event-driven and it’ll consume data from different data sources, process it and store it in some kind of data store. This platform will be a single source of truth for application data needs and also serves data for other consumers in real time, like a live reporting dashboard. We also discussed performance, scaling, etc. Find deck here

Once we finished talking about building a streaming architecture, we started with our next topic — observability titled “Observability at Scale in Real Time” by Balvinder Khurana & Sarang Vinayak Shinde. It was a very good talk about maintaining a healthy data system. In this we discussed how do we address the issue of observing several data sources, whose count is increasing exponentially everyday. How do we understand metadata produced by multiple data sources which are of different structures. For example, nginx log will have a certain data structure, prometheus event will have different structure, general application log will have some other structure and the list goes on. How do we understand all of this data and how do we build a scalable solution so that integrating any new data source will be easier in the future. Discussed solution was to design a resilient messaging system with ability to handle weird behaviours (can be a random data), then have a generalised report dashboard using the data produced by this system, which is kind of self service and can be used by people of various roles, department, etc. Find deck here

Then we went for lunch. Food was good and I got a chance to meet a person who is working in HCL (I guess) on a project related to designing data analytics tools. We discussed his experience of working in a corporate culture for a long time, how he evolved from working on a support project to now leading a data team. It was fun 😊

Post lunch we started with a very interesting topic (at least for me) “Achieving human level accuracy through Deep NLP” by Anirudh Gupta. This talk gave a very nice overview about how we do natural language processing traditionally, what were the drawbacks and how do we resolve those in modern Deep NLP using Neural Machine Translation. We also discussed about pre-processing and optimising the data for deep learning, few data models & what’s coming in future of Deep NLP Find deck here

Last talk of the day was “Achieving Data Democratization” by Krishna Meduri & Bharat Akkinepalli. In this we discussed that once we got the data, how do we handle that ethically. “Data democracy”. Slides of this talk were different from other talks — it didn’t have much text, only minimal required relevant content. Very apt summary “Trustworthy data need to be made discoverable and accessible to authorized users in several formats in a self-service fashion with an ability to process it with various tools & technologies” — like me, if you also are overwhelmed by the technical jargons then Here’s the deck

With this we wrapped up the day. Met a few amazing folks. It was an awesome experience.

Maybe this was well-thought beforehand, the way talks were arranged if we put the pieces together it forms a nice storyline. Which goes something like this :

First we discussed enabling a company to become data-driven without getting into much technical stuff. Once we had the basic idea in mind, we discussed the streaming platform, how we ingest data & make it ready for real time consumption. Then we have basic infrastructure in place. The next talk was about observability, how do we make sure everything is working fine ? Then we discussed the real world scenario — natural language processing. After all of the discussions, now we have the data & we’re using it. The last talk was about maintaining data democracy.

This is based on my personal limited understanding of the subject, I will be happy to discuss in-person if you have something.

Feedbacks ?

Thank you!

--

--

Umair Mohammad

Software Engineer. Enthusiastic about tech, data, iot, software development methodologies and growing together.