myThesisProject.init();
Hej hej from white Stockholm!
Christmas holidays are over and I’m back in town!This will be the last semester of my MSc, during which I will be working on my thesis in collaboration with the Swedish Institute of Computer Science (SICS). I am very excited about the project and this is the first of a series of posts I intend to do, describing my progress and discoveries :-)So what is this super-interesting project I’m going to work on?Before getting to that, I will have to make a small introduction on two systems:
Apache Pig and Stratosphere.Pig is a platform for analyzing big data sets. It consists of a high-level declarative language, Pig Latin, and an execution engine that “translates” Pig scripts into Map-Reduce jobs.Stratosphere is a data-processing framework, under research by TU Berlin. It provides a programming model for writing parallel data analysis applications and an execution engine, Nephele, able to execute dataflow graphs in parallel. You can think of it as an extension/generalization of Hadoop Map-Reduce and it also shares a lot of ideas with Dryad.Although right now it is only possible to execute Pig scripts on top of Hadoop, Pig is designed to be modular and it should be straight-forward to deploy it on top of another execution engine. And this is exactly the initial idea of the project. Additionally, the current state of the project appears to have some limitations that make it about 1,5 times slower than native Map-Reduce at the moment. I believe that Stratosphere architecture has several features that could be exploited in order to improve performance.I am currently in the phase of studying the Pig architecture and the existing Hadoop compiler implementation. (Oh the joy of endless Java code :p )
Soon, I will post here my first findings, so stay tuned!Until then, sweet coding!V.
Christmas holidays are over and I’m back in town!This will be the last semester of my MSc, during which I will be working on my thesis in collaboration with the Swedish Institute of Computer Science (SICS). I am very excited about the project and this is the first of a series of posts I intend to do, describing my progress and discoveries :-)So what is this super-interesting project I’m going to work on?Before getting to that, I will have to make a small introduction on two systems:
Apache Pig and Stratosphere.Pig is a platform for analyzing big data sets. It consists of a high-level declarative language, Pig Latin, and an execution engine that “translates” Pig scripts into Map-Reduce jobs.Stratosphere is a data-processing framework, under research by TU Berlin. It provides a programming model for writing parallel data analysis applications and an execution engine, Nephele, able to execute dataflow graphs in parallel. You can think of it as an extension/generalization of Hadoop Map-Reduce and it also shares a lot of ideas with Dryad.Although right now it is only possible to execute Pig scripts on top of Hadoop, Pig is designed to be modular and it should be straight-forward to deploy it on top of another execution engine. And this is exactly the initial idea of the project. Additionally, the current state of the project appears to have some limitations that make it about 1,5 times slower than native Map-Reduce at the moment. I believe that Stratosphere architecture has several features that could be exploited in order to improve performance.I am currently in the phase of studying the Pig architecture and the existing Hadoop compiler implementation. (Oh the joy of endless Java code :p )
Soon, I will post here my first findings, so stay tuned!Until then, sweet coding!V.

