/* My Journey to the Cloud */ http://vasia.posterous.com Random thoughts, notes I keep, things I've heard, people I've met. posterous.com Tue, 08 May 2012 05:10:00 -0700 Basic Pig Operators http://vasia.posterous.com/basic-pig-operators http://vasia.posterous.com/basic-pig-operators

In this post I will present some of the basic and most common and useful Pig operators. I will explain how they operate on data and what results they produce, but also how they are internally translated into Map-Reduce jobs and executed on the Hadoop execution engine. 

I should remind here how the compilation to Map-Reduce works. The compiler that transforms the Physical Plan into a DAG of Map-Reduce operators uses a predecessor depth-first traversal to generate the graph. When compiling an operator, the goal is to try and merge it in the existing Map-Reduce operator, i.e. in the current Map or Reduce phase. However, some operators, such as group, require the data to be shuffled or sorted, so they cause the creation of a new Map-Reduce operator. The new operator is connected to the previous one with a store-load combination.

  • FOREACH

FOREACH takes as input a record and generates a new one by applying a set of expressions to it. It is essentially a projection operator. It selects fields from a record, applies some tranformations on them and outputs a new record. FOREACH is a non-blocking operator, meaning it can be included inside the current Map-Reduce operator.

Foreach

  • FILTER

FILTER selects those records from dataset for which a predicate is true. Predicates contain equality expressions, regular expressions, boolean operators and user-defined functions. FILTER is also non-blocking and can be merged in the current Map or Reduce plan.

Filter

  • GROUP BY

GROUP collects all records with the same key inside a bag. A bag is a Pig data structure which can be described as an unordered set of tuples. GROUP generates records with two fields: the corresponding key which is assigned the alias "group" and a bag with the collected records for this key.

Groupby

We can group on myltiple keys and we can also GROUP "all". GROUP all will use the literal "all" as a key and will generate one and only record with all the data in it. This can be useful if we would like to use some kind of aggregation function on all our records, e.g. COUNT. 

GROUP is a blocking operator and it compiles down to three new operators in the Physical Plan: Local Rearrange, Global Rearrange and Package. It requires repartitioning and shuffling, which will force a Reduce phase to be created in the Map-Reduce plan. If we are currently inside a Map phase, then this is no big problem. However, if we are currently inside a Reduce phase, a GROUP will cause the pipeline to go through Map-Shuffle-Reduce.

  • ORDER BY

The ORDER BY operator orders records by one or more keys, in ascending ot descending order. However, what is happening behind the scenes is much more interesting than you may imagine. ORDER is not implemented as simply as Sorting-Shuffling-Reduce. Instead it forces the creation of two Map-Reduce jobs. The reason is that datasets often suffer from skew. That means that most of the values are concentrated around a few keys, while other keys have much less corresponding values. This phenomenon will cause only a few of the reducers to be assigned most of the workload, slowing down the overall execution. The first Map-Reduce job that Pig creates is used to perform a fast random sampling of the keys in the dataset. This job will figure out the key distribution and balance the load among reducers in the second job. However, just like in the case of Skew Join, this technique breaks the Map-Reduce convention that all records with the same key will be processed by the same reducer.

  • JOIN

JOIN has been extensively discussed in this post.

  • COGROUP

COGROUP is a generalization of the GROUP operator, as it can group more than one inputs based on a key. Of course, it is a blocking operator and is compiled in a way similar to that of GROUP.

  • UNION

UNION is an operator that concatenates two or more inputs without joining them. It does not require a separate Reduce phase to be created. An interesting point about UNION in PIg is that it does not require the input records to share the same schema. If they do, then the output will also have this schema. If the schemas are different, then the output will have no schema and different records will have different fields. Also, it does not eliminate duplicates.

  • CROSS

CROSS will receive two or more inputs and will output the cartesian product of their records. This means that it will match each record from one input with every record of all other inputs. If we have an input of size n records and an input of size m records, CROSS will generate an output with n*m records. The output of CROSS usually results in very large datasets and it should be used with care. CROSS is implemented in a quite complicated way. A CROSS logical operator is in reality equivalent to four operators:

Cross

The GFCross function is an internal Pig function and its behaviour depends on the number of inputs, as well as the number of reducers available (specified by the "parallel 10" in the script). It generates artificial keys and tags the records of each input in a way that only one match of the keys is guaranteed and all records of one input will match all records of the other. If you are interested in more details, you can read the corresponding part of this book.

My conclusion of the above analysis was that even the Physical Plan is very dependent on the Map-Reduce framework and does not reflect the right level for my work to be done. (CO)GROUP is compiled down to three new operators and CROSS is compiled down to four, while they can be mapped directly to the CoGroup and Cross Input Contracts of Stratosphere. That led me to move up one level and start working in compiling the Logical Plan into a PACT plan. 

It turned out that things are much simpler up there, but a lot more coding needs to be done. Just to illustrate the simplicity, I will use the script and plan generation from the Pig paper:

Paper_script

And this is how the Logical Plan is transformed into a Physical and then a Map-Reduce Plan:

Paper_plans
Now, this is how the Logical Plan could be compiled to a PACT Plan:

Pact_plan
Much simpler and much cleaner! I'm quite optimistic =)

And now I have to sit down and code this thing!

Until next time, happy coding!

V.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1090350/_sagrada_familia.jpeg http://posterous.com/users/hdKYUkxjXnbgC vasia vasia
Mon, 07 May 2012 05:28:00 -0700 Pig's Logical Plan Optimizer http://vasia.posterous.com/pigs-logical-plan-optimizer http://vasia.posterous.com/pigs-logical-plan-optimizer

Hello from *sunny* Stockholm!

It's been almost a month since my last thesis post and as I was hoping, Spring is finally here :D

It's been a crazy, busy and productive month though, so I will be updating you on my progress by writing two posts today!

This one is about Pig's Logical Plan Optimizer. In my previous posts (here and here) I have explained how Pig creates a data-flow graph from the Pig Latin script, the Logical Plan, and then transforms this graph into a set of Map-Reduce jobs. The Logical Plan goes through the first compiler and is transformed into a Physical Plan, and the Physical Plan is then sent to the Map-Reduce compiler, which transforms it into a DAG of Map-Reduce jobs:

Logical_to_physical_to_mr

An intermediate and quite interesting stage which is not visible in the above diagram, is the optimization of the Logical Plan. The initial Logical Plan is created by an one-to-one mapping of the Pig Latin statements to Logical Operators. The structure of this plan is of course totally dependent on the scripting skills of the user and can result in highly inefficient execution.

Pig performs a set of transformations on this plan before it compiles it to a Physical one. Most of them are trivial and have been long used in database systems and other high-level languages. However, I think they're still interesting to discuss in the "Pig context".

 

Rules, RuleSets, Patterns and Transformers

The base optimizer class is designed to accept a list of RuleSets, i.e. sets of rules. Each RuleSet contains rules that can be applied together without conflicting with each other. Pig applies each rule in a set repeatedly, until no rule is longer applicable or it has reached a maximum number of iterations. It then moves to the next set and never returns to a previous set.

Each rule has a pattern and an associated transformer. A pattern is essentially a sub-plan with specific node types. The optimizer will try to find this pattern inside the Logical Plan and if it exists, we have a match. When a match is found, the optimizer will then have to look more in depth into the matched pattern and decide whether the rule fulfils some additional requirements. If it does, then the rule is applied and the transformer is responsible for making the corresponding changes to the plan.

Some extra caution is needed in two places. The current pattern matching logic assumes that all the leaves in the pattern are siblings. You can read more on this issue here. This assumption creates no problems with the existing rules. However, when new rules are designed, it should be kept in mind that the pattern matching logic might need to be changed.

Another point that needs highlighting has to do with the actual Java implementation. When searching for a matching pattern, the match() method will return a list of all matched sub-plans. Each one of them is a subset of the original plan and the operators returned are the same objects as in the original plan.

 

Some Examples

  • ColumnMapKeyPrune

This rules prunes columns and map keys that are not needed. More specifically, removes a column if it mentioned in a script but never used and a map key if it never mentioned in the script.

  • FilterAboveForeach

Guess what? Pushes Filter operators above Foreach operators! However, it checks if the field that Filter works on is present in the predecessor of Foreach:

Filteraboveforeach

  • MergeFilter

As you can imagine, it merges two consecutive Filter operators, adding the condition of the second Filter to the condition of the first Filter with an AND operator:

Mergefilter

  • MergeForeach

This rule merges Foreach operators, but it's not as simple as it sounds. There are a few additional requirements that need to be met. For example, if the first Foreach operator has a Flatten in its internal plan, the rule cannot be applied. The optimizer also checks how many times the outputs of the first Foreach are used by the second Foreach. The assumption is that if an output is reffered tomore than once, the overhead of multiple expression calculation might even out the benefits from the application of this rule:

Mergeforeach2
There are several more optimization rules, but I hope the idea is clear from the examples I already mentioned. All the optimizations performed at this level are general-purpose transformations and decoupled from the execution engine and the Map-Reduce model. However, this is not true after the transformation to a Physical Plan. And this is why I now understand why the integration alternatives I had in mind in late February are not worth implementing.

 

The reason will become clear with my next post very very soon.

Until then, happy coding :)

V.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1090350/_sagrada_familia.jpeg http://posterous.com/users/hdKYUkxjXnbgC vasia vasia
Tue, 10 Apr 2012 08:03:00 -0700 Join Types in Pig http://vasia.posterous.com/join-types-in-pig http://vasia.posterous.com/join-types-in-pig

This blog post is on joins! This trivial but extremely useful relational operation I know you' re all familiar with! 

Inner join, equi-join, natural join, theta-join, outer join, left-outer join, right-outer join, full-outer join, self join, semi-join...

I bet you remember the definitions and tell the differences as easy as you remember the multiplication tables... Right! Once upon a time, I also could... just right before my undergrad databases exam... hmmm...

Honestly, I've always found it hard to remember the specific details for all different types of joins available and I always need to refresh the concepts whenever I need to use a specific type. (Oh, how much I love wikipedia, hell yeah I do! :p)

 

Joins in Map-Reduce

No matter how common and trivial, join operations have always been a headache to Map-Reduce users. A simple google search on "map-reduce join operation" will give you several blog posts, presentations and papers as a result. The problem originates from Map-Reduce's Map-Shuffle-Sort-Reduce static pipeline and single input second-order functions. The challenge is finding the most effective way to "fit" the join operation into this programming model.

The most common strategies are two and both consist of one Map-Reduce job:

  • Reducer-side join: In this strategy, the map phase serves as the preparation phase. The mapper reads records from both inputs and tags each record with a label based on the origin of the record. It then emits records setting as key the join key. Each reducer then receives all records that share the same key, checks the origin of each record and generates the cross product. Slides 21-22 from this ETH presentation provide a very clear example.

 

  • Mapper-side join: The alternative comes from the introduction of Hadoop's distributed cache. This facility can be used to broadcast one of the inputs to all mappers and perform the join in the map phase. However, it is quite obvious that this technique only makes sense in the case where one of the inputs is small enough to fit in the distributed cache!

 

Joins in Pig

Fortunately, Pig users do not need to program the join operations themselves as Pig Latin offers the JOIN statement. Also, since Pig is a high-level abstraction that aims to hide low-level implementation details, they do not need to care about the join strategy... Or do they?

Pig users can use the JOIN operator in pair with the USING keyword in order to select the join execution strategy. Pig offers the following Advanced Join Techniques:

  • Fragment-Replicate Join: USING 'replicated'

It is advised to used this technique when a small table that fits to memory needs to be joined with a significantly larger table. The small table will be loaded in the memory of each machine using the distributed cache, while the large table will be fragmented and distributed to the mappers. No reduce phase is required, as the join can be completely implemented in the map phase. This type of join can only support inner and left-outer join, as the left table is always the one that will be replicated. Pig implements this join by creating two map-only jobs. During the first one, the distributed cache is set and the small input is broadcasted to all machines. The second one is used to actually perform the join operation.

The user must pay attention and have in mind that the second table in their statement will be the one loaded into memory, i.e. in the statement:

joined = JOIN A BY $0, B BY $0 USING 'replicated'

B is the input that will be loaded into memory. Extra care needs to be taken for one more reason when using this type of join. Pig will not check beforehand if the specified input will fit into memory, thus resulting in a runtime error in case it doesn't!

  • Merge Join: USING 'merge'

You should use this type of join when the inputs are already sorted by key. This is a variation of the well-known sort-merge algorithm, where the sort is already performed :)

In order to execute this join, Pig will first run an initial Map-Reduce job that will sample the second input and build an index of the values of the join keys for each HDFS block. The second job will take the first input and utilize the index to find the key it is looking for in the correct block. For each key, all records with this particular key will be saved in memory and used to do the join. In other words, two pointers need to be maintained, one for each input. Since both inputs are sorted, only one lookup in the index is required.

  • Skew Join: USING 'skewed'

The third and last type of join provided by Pig is the skew join. It is quite common that some keys are a lot more popular than others in datasets, that is, most of the values correspond to a very small set of keys. Using the default algorithm in such a case would result in significantly overloading some of the reducers in the system. 

In order to overcome this problem, one can use Pig's skew join. Pig will first sample one of the inputs, searching for the popular keys, whose records would not fit in memory. The rest of the records will be handled by a default join. However, records that belong to one of the identified as popular keys, will be split among a number of reducers. The records of the other input that correspond to keys that were split, will be replicated in each reducer that contains that key.

Skew is supported in one input only. If both tables have skew, the algorithm will still work, but will be significantly slower.

However, extra care should be taken when using this type of join! This algorithm breaks the Map-Reduce convention that all records with the same key will be processed by the same reducer! This could be dangerous or weild unexpected results if one tries to use an operation that depends on all records with the same key being in the same part file!

 

Thoughts...

Pig's philosophy states that "Pigs are domestic animals", meaning that users should be able to control and modify its behaviour. This is one of the reasons why Pig does not have an optimizer to choose among the available join strategies and leaves this choice to the user. However, this choice implies that the users have a deep understanding on how the different techniques work, as well as adequate information regarding the format and distribution of the data they want to join is available. If this is not the case, a wrong choice will almost surely lead to severe execution overhead.

My scepticism comes from the high-level nature that such a system is supposed to offer. What do the users of such systems know and what should they know? In my understanding, the whole point of a high-level abstraction is to hide implementation details and low-level information on how the underlying framework works. And honestly speaking, I can't see how an optimizer would come in conflict to Pig's philosophy on it being a "domestic animal". Maybe, it could be designed so that it is possible to disable.

How is all this related to my thesis? The truth is that I will probably have no time at all to look into this any further. On the other hand, it is interesting to point out that Stratosphere offers an almost natural way of expressing joins and other relational operations using its Input Contracts. The Match Contract essentially maps to an inner-join, while the PACT compiler can choose the most effective execution strategy to implement it. The CoGroup Input Contract can be used to realize outer and anti-joins, while the Cross Contract can be used to implement all kinds of arbitrary theta-joins. 

I personally find this kind of issues really intriguing and although I will probably have to "push" them into "future work", I now have something to look forward after my thesis is done =)

 

I hope it will be Spring already by the next time I post!

Until then, happy coding!

V.

 

PS: For more info on Pig's advanced relational operations, here is da book!

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1090350/_sagrada_familia.jpeg http://posterous.com/users/hdKYUkxjXnbgC vasia vasia
Wed, 21 Mar 2012 12:23:18 -0700 Pig's Hadoop Launcher http://vasia.posterous.com/pigs-hadoop-launcher http://vasia.posterous.com/pigs-hadoop-launcher

This is a post on the functionality of the main class that launches Pig for Hadoop Map-Reduce and also a good starting point for developers wishing to contribute to the Pig project.

The class in question is the MapReduceLauncher and is found in the package  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer

It extends the abstract class Launcher, which provides a simple interface to:

  • reset the state of the system after launch
  • launch Pig (in cluster or local mode)
  • explain how the generated Pig plan will be executed in the underlying infrastructure

Other methods provided are related to gathering runtime statistics and retrieving job status information.

The most important methods of MapReduceLauncher are compile() and launchPig(). It is advised that launchers for other frameworks (i.e. other than Hadoop MR) should override these methods.

The compile method gets a Physical Plan and compiles it down to a Map-Reduce Plan. It is the point where all optimizations take place. A total of eleven different optimizations are possible in this stage, including combiner optimizations, secondary sort key optimizations, join operations optimizations etc. Optimizations will be the focus of the second phasis of my thesis, when I will have to dig into these classes!

The launchPig method is more interesting to me at this point of my work. It receives the Physical Plan to be compiled and executed as a parameter and returns a PigStats object, which contains statistics collected during the execution.

In short, it consists of the following *simplified* steps:

  • Calls the compile method and retrieves the optimized Map-Reduce Plan
  • Retrieves the Execution Engine
  • Creates a JobClient Object

The JobClient class provides the primary interface for the user-code to interact with Hadoop's JobTracker. It allows submitting jobs and tracking their progress, accessing logs and status information. Usually, a user creates a JobConf object with the configuration information and then uses the JobClient to submit the job and monitor its progress.

  • Creates a JobControlCompiler object. The Jo0bControlCompiler compiles the Map-Reduce Plan into a JobControl object

The JobControl object encapsulates a set of Map-Reduce jobs and their dependencies. It tracks the state of each job and has a separate thread that submits the jobs when they become ready, monitors them and updates their states. I hope the following diagrams will make this clear:

Insidethejobcontrolobject
Controlledjobstatediagram

 

  • Repeatedly calls the JobControlCompiler's compile method until all jobs in the Map-Reduce Plan are exhausted
  • While there are still jobs in the plan, retrieves the JobTracker URL, launches the jobs and periodically checks their status, updating the progress and statistics information
  • When all jobs in the Plan have been consumed, checks for native Map-Reduce jobs and runs them
  • Finally, aggregates statistics, checks for exceptions, decides the execution outcome and logs it

Next Steps

When I did the analysis above (almost two weeks ago) I made a list of my next steps including:

  • Browse through the rest of the thesis-related Pig code, i.e. org.apache.pig.backend*
  • Identify the Classes and Interfaces that need to be changed
  • Identify Hadoop dependencies in the Pig codebase
  • Find the Stratosphere "equivalents" of JobControl, JobClient, JobConf etc.
  • Find out how to run a PACT program from inside Pig

Since then, I've been browsing through the Pig code and I have also started coding (finally!). I've identified way more classes and interfaces that need to be changed even for the simplest version of the system I'm building and I am certainly amazed by the amount of dependencies I've found and need to take care of... And it seems that finding "equivalents" is not a straigh-forward or easy task at all!

But the challenge has already been accepted! I'll be updating soon with my solutions :-)

Until then, happy coding!

V.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1090350/_sagrada_familia.jpeg http://posterous.com/users/hdKYUkxjXnbgC vasia vasia
Tue, 28 Feb 2012 11:35:00 -0800 Pig and Stratosphere Integration Alternatives http://vasia.posterous.com/pig-and-stratosphere-integration-alternatives http://vasia.posterous.com/pig-and-stratosphere-integration-alternatives

In this post I am going to present some alternative design choices concerning the actual implementation of the project, i.e. the integration of Pig and Stratosphere systems.

The main goal is to have a working system, such that Pig Latin scripts can be executed on top of the Nephele execution engine. However, performance is an issue, and of course, we wouldn't like to end up with a system slower than the current implementation :S The very motivation of this project is to overcome the limitations of the existing system, by exploiting Stratosphere's features.

The architectures of the two systems are shown side by side in the next diagram:

Ps0

The integration can be achieved in several ways and on different levels:

  • Translate MapReduce programs into PACT programs

This is the naive and straight-forward way of solving the given problem. PACT already supports Map and Reduce Input Contracts, which can be used for the transformation of the Map-Reduce Plan into a one-to-one PACT Plan. The Logical and Physical Plans that are generated by Pig can be re-used without modification. It is obvious that this solution wouldn't provide any gains compared to the existing implementation. In fact, it should be slower, since it adds one more layer to the system architecture. However, it is the simplest approach and it will be my starting point, in order to better understand the framworks' internals =)

Psv1

  • Translate the Physical Plan into a PACT Plan

This is a more natural solution and corresponds to the approach that would have been taken if Pig had been designed having Nephele in mind as execution engine, instead of Hadoop. It includes completely replacing the MapReduce Plan by a PACT Plan, which will be generated directly from the Physical Plan. This way, the additional Input Contracts, such as Match, Cross and CoGroup, could be used to compile common operation, like Joins. I hope and do expect this solution to be advantageous over the existing implementation. With this design, we should be able to exploit stratosphere's advantages and reflect them as performance gains, in certain classes of applications.

Psv2

  • Translate the MapReduce Plan (or even Physical Plan) into a Nephele Job
If you just look at the two system architectures, as shown in the above figures, you might think that the more layers you take away the faster the resulting system would be. For example, one could argue that getting rid of both the high-level programming frameworks, Map-Reduce and PACT, would speed up things. However, merging at that point, would include re-implementing a job already done, i.e. compiling down to code that can be understood by an execution engine, such as Nephele (or Hadoop). A speedup in this case is quite unprobable to happen and it should mean that there is something wrong with the PACT compiler. Well, I have no reason to suspect so, or any spare time to check this during the 3 months I have left :p

The solutions discussed here are not the only ones possible. One could think of and propose several variations in different levels. For example, in order to take full advantage of Stratosphere's flexibility, it would be reasonable to try and modify Pig in the level of the Physical Plan. Of course, there is the danger of messing up with Pig's modularity and making it execution engine dependent. Moreover, one could exploit Stratosphere's Output Contracts and implement optimization rules, in cases such as grouping or joining pre-partinioned or already sorted data. 

The thing I like with this project is that I constantly have more and more ideas about variations, optimizations and possible extensions. And every time I meet with my supervisor and his team, I fill in my notebook with as many interesting and motivating thoughts from them all! However, I don't have all the time in the world, so I focus in the first two alternatives for the purpose of my thesis.
Just a final conclusion and something that I always have in mind while working on this project:

When any kind of abstracton is made, and this applies as well for high-level languages, there is always an overhead you have to pay in exchange for simplicity. The underlying system, of which the details the user doesn't need to know anymore, will be designed to take several decisions that would often differ from those an experienced low-level programmer would take.

However, the abstraction only has value, provided that the frustration imposed to the user by the slow-down of accomplishing their job, is lower than the satisfaction they get by being able to accomplish this job in a simpler way.

 

Hoping for a valuable abstraction!

Until next time, happy coding,

V.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1090350/_sagrada_familia.jpeg http://posterous.com/users/hdKYUkxjXnbgC vasia vasia
Wed, 15 Feb 2012 13:56:39 -0800 The PACT programming model http://vasia.posterous.com/the-pact-programming-model http://vasia.posterous.com/the-pact-programming-model

PACT is Stratosphere's programming model. It consists of the so-called Parallelization Contracts which push the Map-Reduce idea one step further.

I was thinking of writing a post, explaining how PACT works, but the truth is that I wouldn't do any better than the already existing documentation at the Stratosphere project website.

So, I will only provide here some useful links:

  • The Pact Programming Model: A high-level view of the programming model and in-detail presentation of the second-order functions available and the guarantees provided by the framework.
  • Building a PACT Program: A guide to PACT programming, including everything you need to know before starting writing PACT programs.
  • Example Jobs: Six example PACT programs of varying difficulty, starting from simple WordCount to more complex graph analysis algorithms.
  • The PACT Compiler: A detailed overview of how the PACT Compiler is built and how it performs the transformation of PACT programs into Nephele DAGs.

If you are already familiar with MapReduce programming, you will also find this paper very helpful. It compares the two programming models and contains a series of examples of common data analysis tasks implemented in both models.

 

Keep on happy coding,

V.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1090350/_sagrada_familia.jpeg http://posterous.com/users/hdKYUkxjXnbgC vasia vasia
Wed, 15 Feb 2012 07:38:00 -0800 The Nephele Execution Engine http://vasia.posterous.com/the-nephele-execution-engine http://vasia.posterous.com/the-nephele-execution-engine
This post is an overview of Nephele, Stratosphere's execution engine. The programming model provided with Stratosphere is PACT (and will be covered in a future post). However, it is possible to submit jobs directly to the Nephele engine, in the form of Directed Acyclic Graphs (DAGs), where each vertex of the graph represents a task of the job. There are three types of vertices: Input vertices, Output vertices and Task vertices. The edges of the graph correspond to the communication channels between tasks.
 
Why choose Nephele over other engines? 
 
One big advantage of Nephele is the high degree of parametrization it offers, which could lead to several optimizations. It is possible for the user to set the degree of data parallelism per task or explicitly specify the type of communication channels between nodes. More importantly, Nephele supports dynamic resource allocation. In contrast to MapReduce ans Dryad which are designed to work on static cluster environmnets, Nephele is capable of allocating resources from a Cloud environmnet depending on the workload.

Architecture Overview

Nephele consists of several communicating components that are presented next in more detail. An overview of the Nephele architecture is shown in the following diagram.

In order to submit a job to the Nephele engine, a Client has to communicate with the the Job Manager. The Job Manager is unique in the system and is responsible for scheduling the jobs it receives and coordinating their execution.

The resources required for job execution are managed by the Instance Manager. The Instance Manager allocates or deallocates virtual machines, depending on the workload of the current execution phase. The jobs are executed in parallel by instances, each of which is controlled by a Task Manager. The Task Manager communicates with the Job Manager and is assigned jobs for execution. During execution, each Task Manager sends information about changes in the execution state of the job (completed, failed, etc.). Task Managers also periodically send heartbeats to the Job Manager, which are then propagated to the Instance Manager. This way, the Instance Manager is  keeping track of the availability of running instances. If a Task Manager has not sent a heartbeat in the given heartbeat interval, the host is assumed to be dead. The Instance Manager then removes the respective Task Manager from the set of compute resources and calls the scheduler to take appropriate actions.

When the Job Manager receives a job graph from the Client, it decides how many and what types of instances need to be launched. Once all virtual machines have booted up, execution is triggered. Persistent storage, accessible from both the Job and Task Managers, is needed to store the jobs’ input and output data.
 

Nephele Jobs
 
Jobs in Nephele are defined as Directed Acyclic Graphs (DAGs). Each graph vertex represents one task and each edge indicates communication flow between tasks. Three types of vertices can be defined: Task vertex, Input vertex and Output vertex. The Input and Output vertices define how data is read or written to disk. The Task vertices are where the actual user code is executed.
 
Nephele defines a default strategy for setting up the execution of a job. However, there is a set of parameters that the user can tune in order to make execution more efficient. These parameters include the number of parallel subtasks, the number of subtasks per instance, how instances should be shared between tasks, the types of communication channels and the instance types that fulfill the hardware requirements of a specific job.
 
Nephele offers three types of communication channels that can be defined between tasks. A Network Channel establishes a TCP connection between two vertices and allows pipelined processing. This means that records emitted from one task can be consumed by the following task immediately, without being persistently stored. Tasks connected with this type of channel are allowed to reside in different instances. Network channels are the default type of communication channel chosen by the Nephele, if the user does not specify a type. Subtasks scheduled to run on the same instance can be connected by an In-Memory Channel. This is the most effective type of communication and is performed using the instance’s main memory, also allowing data pipelining. The third type of communication is through File Channels. Tasks that are connected through this type of channel use the local file system to communicate. The output of the first task is written to an intermediate file, which the serves as the input of the second task.

And now what?

Pig's execution plans can also be represented as DAGs. The challenge now is to study how to convert Pig's plans into Nephele Job graphs. This would be an approach that would skip Stratosphere's programming model layer. It is not clear what implications such a decision could have perfomace-wise. On one hand, skipping one layer of execution could definitely lead to performance gains. However, the PACT compiler is designed to perfom several optimizations when translating PACT programs into Nephele DAGs. It is my wish is to implement and evaluate both alternatives. Let's hope I will have enough time for that!

Unti next time, happy coding!

V.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1090350/_sagrada_familia.jpeg http://posterous.com/users/hdKYUkxjXnbgC vasia vasia
Tue, 24 Jan 2012 02:24:00 -0800 First Dive into Pig Code http://vasia.posterous.com/first-dive-into-pig-code http://vasia.posterous.com/first-dive-into-pig-code
The Pig project code base is quite big and complex. In this post, I will focus on the back end side of the system, meaning the execution engine. The hierarchy of Pig’s back end looks roughly like this:

Pigbackend
Zooming in the Hadoop execution engine, we get the following diagram:

Pigexecutionengine
 
However, the engine itself also has a front end and a back end.
 
The front end takes care of all compilation and transformation from one Plan to another. First, the parser transforms a Pig Latin script into a Logical Plan. Semantic checks (such as type checking) and some optimizations (such as determining which fields in the data need to be read to satisfy the script) are done on this Logical Plan. The Logical Plan is then transformed into a PhysicalPlan. In the above hierarchy, PhysicalPlan lies under ExecutionEngine -> PhysicalLayer -> Plans. This Physical Plan contains the operators that will be applied to the data.
This PhysicalPlan is then passed to the MRCompiler. The MRCompiler lies under ExecutionEngine-> MapReduceLayer. This is the compiler that transforms the PhysicalPlan into a DAG of MapReduce operators. It uses a predecessor depth-first traversal of the PhysicalPlan to generate the compiled graph of operators. When compiling an operator, the goal is first trying to merge it in the existing MapReduce operators, in order to keep the generated number of jobs as small as possible. A new MapReduce operator is introduces only for blocking operators and splits. The two operators are then connected using a store-load combination. The output of the MRComiler is an MROperPlan object. This corresponds to the Map-Reduce plan to be executed.
This plan is then optimized by using the Combiner where possible or by compining jobs that scan the same input data etc..  
The final set of of MapReduce jobs is generated by the JobControlCompiler. This class lies under ExecutionEngine-> MapReduceLayer. It takes an MROperPlan and converts it into a JobControl object with the relevant dependency info maintained. The JobControl Object is made up of Jobs each of which has a JobConf. The conversion is done by the method compile(), which compiles all jobs that have no dependencies, removes them from the plan and returns. It must be called with the same plan until exhausted and it returns a JobControl Object
The generated jobs are then submitted to Hadoop and monitored by the 
MapReduceLauncher.
 
In the back end, each PigGenericMapReduce.Map, PigCombiner.Combine, and PigGenericMapReduce.Reduce use the pipeline of physical operators constructed in the front end to load, process, and store data.

The goal of my project is to replace the Hadoop execution engine component with a new one, corresponding to the Nephele execution engine. It might sound easy, but it is not as trivial as it looks. Even if Pig was built having modularity in mind and trying to make it independent of the execution engine, it seems that this is not exactly the case. A lot of parameters are Hadoop-specific and there are a lot of dependencies outside the Hadoop packages that need to be taken care of.

Wish me luck!
I wish you happy coding :-)
V.

 

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1090350/_sagrada_familia.jpeg http://posterous.com/users/hdKYUkxjXnbgC vasia vasia
Wed, 18 Jan 2012 01:48:00 -0800 Pig 101 http://vasia.posterous.com/pig-101 http://vasia.posterous.com/pig-101
In this post, I'll be presenting the basics of the Pig system. Note that this is not a tutorial on how to write Pig programs, but rather a system overview. I will be focusing more on the compiler and the techniques used to translate Pig Latin scripts into Map-Reduce jobs.

Why use Pig?

So, what’s the motivation behind Pig? Why use it when you can write simple Java Map-Reduce programs? Simple... hmmm... Well, yes, if what you want to do is WordCount! But when it comes to other operations, like joins or complex algorithms, it’s not that trivial. Pig is very easy to use for people familiar with SQL or scripting languages and it doesn’t require you to understand how Map-Reduce works. Pig Latin programs are also way smaller than Java programs (obviously). So, they’re also probably much faster to write! And the best part is that operations like join, sort, filter and group are already provided in Pig Latin (as expected by a self-claimed high-level language :p)
Not convinced? Then I hope the most popular Map-Reduce example will wash away all your doubts!

An example: WordCount

wordInput = LOAD 'input' USING TextLoader();
words = FOREACH wordInput GENERATE FLATTEN((TOKENIZE($0))) AS word;
grouped = GROUP words BY word;  
result = FOREACH grouped GENERATE group AS key, COUNT(words) AS count;
STORE result INTO 'wordOutput';

Pig’s declarative nature makes it so much more intuitive to write data analysis programs :-)
You don’t have to think about keys and values and squeeze your head on how to fit your problem into a map and a reduce function!
Running this script locally on my machine giving as input the text of the first section of this post produces the following output:

I 1
a 2
In 1
be 2
is 1
of 1
on 2
to 2
Pig 3
and 1
but 1
how 1
not 1
the 4
…           


System Overview

Here’s a simple diagram to show the system architecture:

 

The simple Pig Latin program we wrote above, will get parsed and a Logical Plan of operations will be created. This plan will be optimized and turned into a Physical Plan, which will feed the Map-Reduce Compiler, which in turn, will generate a Map-Reduce Plan. This plan will be optimized again and sent to Hadoop for execution:


Each node of the Logical Plan represents an operation of the script and the arrows connecting the nodes show how data flow from one step to the next. Each node in this plan is then translated into one or more nodes of the Physical Plan, depending on the complexity of the operation. In the end, nodes are grouped together to form Map and Reduce operations. In our example, FOREACH and Local Rearrange can be performed inside the Mapper, while the Package and the next FOREACH can be performed inside the Reducer. The Global Rearrange and LOAD/STORE operations are taken care by the Hadoop framework automatically.

More generally, there are some rules to follow in order to convert a Physical Plan into a Map-Reduce Plan:

  • Convert each (CO)GROUP into a Map-Reduce job
  • Map assigns keys based on the BY clause
  • Each FILTER and FOREACH between the LOAD and the (CO)GROUP are pushed into the map function
  • Commands between (CO)GROUP operations are pushed into the reduce function
  • Perform tagging in case of multiple input sets
  • Each ORDER command is compiled into 2 Map-Reduce jobs
    • Job 1 samples the input to determine key distribution
    • Job 2 generates roughly equal-sized partitions and sorts
These rules were designed in the initial version of the Pig system. There is a high chance they have changed. In the next days I will be studying those rules in more detail and I’ll get back!

Until then sweet coding!
V.


References and links

- Alan F. Gates, Olga Natkovich, Shubham Chopra, Pradeep Kamath, Shravan M. Narayanamurthy, Christopher Olston, Benjamin Reed, Santhosh Srinivasan, and Utkarsh Srivastava. Building a high-level dataflow system on top of Map-Reduce: the Pig experience.
- Christopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, and Andrew Tomkins. 2008. Pig latin: a not-so-foreign language for data processing.
- http://pig.apache.org/
- http://www.cloudera.com/videos/introduction-to-apache-pig

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1090350/_sagrada_familia.jpeg http://posterous.com/users/hdKYUkxjXnbgC vasia vasia
Mon, 16 Jan 2012 10:56:00 -0800 myThesisProject.init(); http://vasia.posterous.com/mythesisprojectinit http://vasia.posterous.com/mythesisprojectinit
Hej hej from white Stockholm!
Christmas holidays are over and I’m back in town!

This will be the last semester of my MSc, during which I will be working on my thesis in collaboration with the Swedish Institute of Computer Science (SICS). I am very excited about the project and this is the first of a series of posts I intend to do, describing my progress and discoveries :-)

So what is this super-interesting project I’m going to work on?

Before getting to that, I will have to make a small introduction on two systems:
Apache Pig and Stratosphere.

Pig is a platform for analyzing big data sets. It consists of a high-level declarative language, Pig Latin, and an execution engine that “translates” Pig scripts into Map-Reduce jobs.

Stratosphere is a data-processing framework, under research by TU Berlin. It provides a programming model for writing parallel data analysis applications and an execution engine, Nephele, able to execute dataflow graphs in parallel. You can think of it as an extension/generalization of Hadoop Map-Reduce and it also shares a lot of ideas with Dryad.

Although right now it is only possible to execute Pig scripts on top of Hadoop, Pig is designed to be modular and it should be straight-forward to deploy it on top of another execution engine. And this is exactly the initial idea of the project. Additionally, the current state of the project appears to have some limitations that make it about 1,5 times slower than native Map-Reduce at the moment. I believe that Stratosphere architecture has several features that could be exploited in order to improve performance.

I am currently in the phase of studying the Pig architecture and the existing Hadoop compiler implementation. (Oh the joy of endless Java code :p )

Soon, I will post here my first findings, so stay tuned!

Until then, sweet coding!

V.

 

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1090350/_sagrada_familia.jpeg http://posterous.com/users/hdKYUkxjXnbgC vasia vasia
Fri, 30 Sep 2011 06:20:00 -0700 Sorry, I can't accept your Greek ID! http://vasia.posterous.com/sorry-i-cant-accept-this-id http://vasia.posterous.com/sorry-i-cant-accept-this-id

Well well!

It's already been two months I'm in Sweden and a bit more than a month studying and living in Stockholm. A lot of interesting things have happened so far and I've been always in the middle of something preventing me from posting here. But today I found myself in a situation I'd like to share.

 

It's the end of the month and I rushed this morning to pay my rent on time. I asked my neighbour where she pays hers and found out that I could pay at an exchange office near my place. So, I went there and when it was my turn I gave the invoice to the employee. I asked to pay with my credit card and she asked to see my ID *. So, I gave her my Greek ID and she started looking at it, making several weird faces while trying to understand what was written on it. Then she turned at me and said:

- I'm sorry, I cannot accept this. Don't you have a passport?

Well, I do, but I didn't have it with me at the moment. I tried to explain to her that this is an EU ID and that it is valid, but she wouldn't listen. What really stroke me as odd though, was the fact that I was in a *bank* and I was trying to *pay* and the bank was *refusing my money*. I mean, it's a payment, who cares if it's me paying it or someone else? That's really the first time in my life I come across such a situation. Anyway, I got quite pissed off and went to a tobacco place next door where I payed without any problem (apart from the fact that I had to pay in cash).

 

I don't blame the girl at the exchange office for not accepting my ID. In fact, I don't even know if she had the right to do so or not. In the beginning I got really angry but I quickly realised that I do understand her. The truth is that Greek IDs are really awful. Mostly written in greek with some fields written also in latin. Easily confuses anyone. And to think that actually my ID is quite new, as I had to issue it again just last year, after losing my wallet in Barcelona.

My previous ID had all the fields -- wait for it -- handwritten!

I've never had any serious problems with my Greek ID inside Europe before (leaving UK airports apart). Usually people are just surprised by the looks of it but recognise it and accept it. They might ask where to find the date of birth or issue date but up to that.

 

The sad thing is that none of this would have happened if I had my Swedish Personal Number already. But I don't and I applied for it more than a month ago. There, at the tax office I had a similar story to share which I found funny at the moment. Not having this number not only creates problems like the one I've just described but also prevents me from opening a bank account here and therefore receiving my salary! So, apart from not being able to pay and I can't get paid either!

I don't know why it takes so long to issue this number and to be honest I am quite disappointed by the Swedish public sector bureaucracy. When visiting the tax office I couldn't but compare my experience to the one I had in Barcelona last year for the same purpose. Yes, I admit that the queue was much longer and that I had to wait for hours, then sent to another office, then pay some fee and then come back and wait in the queue again, but at least my Spanish Personal Number was issued the same day!

 

I'm just hoping I get the Swedish Number soon and not having any more similar adventures :)

 

Cheers,

V.

(note: my credit card is PIN protected and I am never asked for ID whenever I perform transactions with it)

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1090350/_sagrada_familia.jpeg http://posterous.com/users/hdKYUkxjXnbgC vasia vasia
Fri, 01 Jul 2011 01:00:00 -0700 Little girl on the beach http://vasia.posterous.com/little-girl-on-the-beach http://vasia.posterous.com/little-girl-on-the-beach

Isn't it scary how life seems so easy sometimes?

Don't you feel lucky and don't you feel terrified?

P6280090

 

Isn't it scary how easy it is to forget?

Doesn't it scare you?

P6280092

 

The storm that's approaching...

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1090350/_sagrada_familia.jpeg http://posterous.com/users/hdKYUkxjXnbgC vasia vasia
Mon, 30 May 2011 14:36:00 -0700 iMpRessioNs fRoM pRiMaveRa SouND 2011 http://vasia.posterous.com/impressions-from-primavera-sound-2011 http://vasia.posterous.com/impressions-from-primavera-sound-2011

... and an unexpected ending

 

5 days, 14 stages, more than 200 artists, more than 250 concerts, more than 100.000 people!

These are the numbers but nothing can describe the experience!

Note: I'm not a music maker, critic or producer, so I wrote this post as a humble festival attendant :)

Event Organization

I can't imagine how much planning and coordination is needed to organize such an event and overall I think they did pretty well. There were problems of course, but not that important to spoil the mood.

I'd note the following:

  • Event venues

Poble Espanyol, totally fit for concerts, reminded me of Athens' Technopolis. Great sound quality and all the beauty of Spanish architecture around made the experience incomparable!

Parc del Forum. Sea breeze, sand, trees and all the things you can relate to Springtime aka Primavera! Big enough to host all these people and different stages. The stages were far enough to avoid interference of concurrent shows and close enough to move from one to the other on time. Also, surprising clean toilets, despite the amount of people.

  • Long queues

Both in Poble Espanyol and Parc del Forum, the audience suffered long queues to exchange their tickets for the bracelets. However, that was not the case for those who had chosen to buy their tickets through PayPal.

  • Portal and charge system failure

Apart from the bracelet, everybody received an access card that was supposed to be used for accessing the festival venues, as well as buying drinks during the concerts. In order to charge money on your card, you were supposed to login to the website portal and connect your credit card to this access card. Alternatively, you could charge money in the card at several kiosks around the festival area. That worked well during the first day at Poble Espanyol, but the next day the portal was down all day long and no purchases could be done with the card at the festival. That was kind of frustrating, but soon enough, special stands were established to get back the money charged in the cards. Traditional cash-only transactions from then on!

  • Food and Drinks

Variety of food choices, especially in Parc del Forum. Hot dogs, crepes, sandwiches, chinese, hamburgers, pasta, salads, vegetarian. Beers and cocktails. Could have been cheaper though.

 

What I saw, What I liked and What I didn't

Day 1, Wednesday May 25th, Poble Espanyol

Echo & The Bunnymen: They played intensively, they played lively and the crowd loved them. They avoided slow songs and played what people knew and like. I hadn't seen them before and they're not my style. But I can say they gave a great show and I respect them.

Caribou: Overpassed every expectation I had. Excellent performer, him and also the rest of his "crew". Loved their way of playing several songs without a pause. The crowd was excited and dancing non-stop! The first great show of the festival and surely a show to remember. The next morning, waiting for the bus to go to the university, I realised I was dancing alone in the bus stop in the rythm of "Sun" that wouldn't leave my mind...

Day 2, Thursday May 26th, Parc del Forum

Grinderman: Huge Nick Cave, huge show, huge performance! Magic voice. Passion, anger, intensity. Cave is a man to worship and the crowd did exactly that. Getting off the stage often to touch his audience and be touched, screaming with rage, then singing gently like the man of your dreams. Nick Cave I bow to you! 

Interpol: I'm not a fan but I like a few songs and Interpol is considered a "must-see" band by many people, so I decided to ignore Nick Cave's suggestion (who kept saying during his show that we have to go see Suicide next) and move to Llevant. I think I managed to stay about 20 minutes. Too formal, too fake, too slow, too predefined, if you know what I mean. Everything was in place and total order. There was no feeling of freedom on stage. It seemed like they had even rehearsed their face expressions. However, there were people that loved the show. As for me, I decided to go to Suicide for the remaining time, but Caribou was playing closer and grabbed my attention once more!

The Flaming Lips: What a show! "Come on! Come on! Come on!", excellent perfomer Wayne calling at the crowd every now and then. Maybe it was too much of a show than a concert. But it was absolutely one of the highlights of the festival.

Suuns: I guess I went because I was around at that moment. I can't even remember what they were playing. Totally boring.

 

Day 3, Friday May 27th, Parc del Forum

Explosions In The Sky: "Somos explosiones en el cielo" they said and the flight took off. The whole concert was a trip. Pure Magic. Great artists, thanked the audience of Barcelona for inviting them once again and you could feel they meant it. You could feel they were happy to be here. You could see they felt the music and that feeling was spreading in the air. My favourite of the festival.

Pulp: "Do you remember the first time?" First time I saw Pulp and I will certainly remember it! The most anticipated concert of the festival by many. They gave it all. Played what you were expecting them to play, interacted with the people. They made me dance even right after the "darkness" of Explosions and they made me admire. I know I saw a historic concert but they didn't win my heart. I'm sorry. 

Del Rey: What can I say. Astonished. Amazing double drum set, amazing rythm, amazing audience. These guys have talent and it's obvious! The only show I attended that the crowd asked for "otra"! And they didn't let us down. They returned on stage and played for 15 more minutes! They have a place in my heart :)

 

Day 4, Saturday May 28th, Parc del Forum

Einstürzende Neubauten: I didn't know them and was not interested in any other show at that moment. They described them to me as "minimalistic". Not sure what that means, but they caught my attention. Typically German but not something you've seen before. I'd describe them as "psycho". They convinced me. I will listen to them.

PJ Harvey: I know a lot of people will hate me but the impression I have in my mind is a common woman in white dress singing without passion. Disappointed.

Mogwai: One of my favourite bands and a show that I anticipated a lot. And they didn't let me down :) Athough, saying "gracias, thank you very much", that and nothing more, after every single song, was kind of annoying I must admit.

The Black Angels: I arrived 10 min before the end of Odd Future concert which was right before, to find around 30 people on stage dancing, jumping, singing! When the Black Angels came on stage, they realised there were sound problems. Probably because of the previous mess. The problem was solved after about half an hour and the Black Angels appeared on stage to overcome every expectation of their impatient audience! The next day everybody was speaking about their performance. I'd just like to point out that the sound quality for those standing on the right side of the stage was really inferior to the rest of the area. That was most probably a permanent problem of the Pitchfork stage, located right next to the sea without any coverage on the right side of the stage.

DJ Coco: Who's that, right? He played 80s, he played 90s and the audience loved him. Actually it was like being in a typical disco of Barcelona. Nothing impressive but exactly what you need to end the party!

 

Day 5, Sunday May 29th, Apolo

...or at least that was the plan. Go to Apolo to see The Black Angels once again. But the queue was long and we decided to have a mojito first, so we headed for the closest square, the one in Raval.

The bar was empty. We ordered mojitos and waited. I knew there was live flamenco show perfomed by locals, but I couldn't imagine what was about to follow.

Two guys with their guitars appeared and started playing. People started filling the bar. Some minutes later a blonde guy (totally not Spanish) was passing outside. He heard the music and got in. He was holding a trumpet. It was obvious that he didn't know the rest of the people playing. He sat down, felt the rythm and joined them. Then I noticed an old man with grey long beard. He left his drink and left for a while. He came back with a pair of bongos and joined, too. After a while the trumpets became two, a girl started dancing, everybody clapping their hands.

This is a small video I managed to record:

output.mp4 Watch on Posterous

 

As you may imagine, we never arrived at Apolo on time for The Black Angels.

But this was one of the best performances I saw these days

and one more reason to fall in love with Barcelona :)

V.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1090350/_sagrada_familia.jpeg http://posterous.com/users/hdKYUkxjXnbgC vasia vasia
Sat, 21 May 2011 03:38:00 -0700 Ούτε ένας καμμένος κάδος... http://vasia.posterous.com/53872397 http://vasia.posterous.com/53872397

Μα τι ξεφτίλα, τι ντροπή για διαδήλωση, τι αίσχος!

Να μη κάψουν ούτε έναν κάδο, ένα atm, ένα μετανάστη βρε αδερφέ!

Να μην κλείσουν έναν δρόμο, να μην κάνουν μια κατάληψη, να μη δείρουν έναν αστυνομικό;

Ούτε ένα δακρυγόνο, μια μολότοφ, ένα ντου με καδρόνια έστω!

Μα να κάθονται απλά και ειρηνικά και να κατασκηνώνουν στη μέση της πλατείας;

This  on Twitpic

Θυμάμαι πριν 3-4 χρόνια, όταν είχε ξεκινήσει αυτή η ιστορία με τα ιδιωτικά πανεπιστήμια και το άρθρο 16.

Είχα πάει σε μία πορεία από τις λίγες όπου δεν είχαν γίνει επεισόδια.

Ήμασταν πάρα πολύς κόσμος, κάμποσες χιλιάδες φοιτητές. Είχαμε πλημμυρίσει το κέντρο γύρω από το Σύνταγμα.

Και ήμασταν χαρούμενοι για την τόση συμμετοχή, κυρίως μη κομματοποιημένων φοιτητών.

Φοιτητών που, σαν κι εμένα, είχαμε πάρει μόνοι μας την πρωτοβουλία να στηρίξουμε τη διαδήλωση.

Και η πορεία κύλησε ειρηνικά, χωρίς απρόοπτα, χωρίς ξύλο, χωρίς γνωστούς-αγνώστους, χωρίς ματ, χωρίς δακρυγόνα.

Και δεν το έμαθε ποτέ κανείς, δεν ακούστηκε σε κανένα μεγάλο δελτίο, δε γράφτηκε σε καμιά μεγάλη εφημερίδα. Το θυμόμαστε μόνο όσοι ήμαστε εκεί.

Και την επόμενη μέρα, ένας συμφοιτητής μου λέει: "Και τι περίμενες; Αν δε γίνουν επεισόδια, αν δεν εμπλακούν μπάτσοι, αν δεν πέσει ξύλο κι αν δεν καεί το κέντρο, δεν πρόκειται να ασχοληθεί κανείς μαζί μας".

Και να λοιπόν που είχε άδικο.

Εδώ και 6 μέρες, Ισπανοί έχουν μαζευτεί στις πλατείες όλων των μεγάλων πόλεων της χώρας και κατασκηνώνουν σε ένδειξη διαμαρτυρίας για τις αυριανές εκλογές, για την κρίση, για ότι τελοσπάντων βλέπουν στραβό και άσχημο. Και διαμαρτύρονται ειρηνικά.

Μήπως, λέω μήπως, είναι καιρός να παρουμε ένα καλό μάθημα εκεί στην πατρίδα;

Εγώ απόψε θα είμαι εκεί.

Όχι γιατί με απασχολούν οι εκλογές στην Ισπανία, αλλά γιατί αυτοί οι άνθρωποι κατάφεραν αυτό που προσπαθούμε εμείς τόσα χρόνια.

Και το κατάφεραν τόσο απλά.

Β.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1090350/_sagrada_familia.jpeg http://posterous.com/users/hdKYUkxjXnbgC vasia vasia
Mon, 16 May 2011 06:16:00 -0700 Εδώ και 125 χρόνια.. οι πιστοί συνιστούν Sacre Coeur! http://vasia.posterous.com/125-sacre-coeur http://vasia.posterous.com/125-sacre-coeur

P5150369
Sacre Coeur, Paris. Χωρίς λόγια..

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1090350/_sagrada_familia.jpeg http://posterous.com/users/hdKYUkxjXnbgC vasia vasia
Mon, 16 May 2011 06:10:00 -0700 How much does your prayer cost? http://vasia.posterous.com/how-much-does-your-prayer-cost http://vasia.posterous.com/how-much-does-your-prayer-cost

P5140314
Notre Dame, Paris

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1090350/_sagrada_familia.jpeg http://posterous.com/users/hdKYUkxjXnbgC vasia vasia
Thu, 12 May 2011 13:29:00 -0700 Αθήνα: countdown to crash? http://vasia.posterous.com/-countdown-to-crash http://vasia.posterous.com/-countdown-to-crash

Βαρκελώνη, χθες βράδυ. Περπατάω από ένα μπαρ προς τη στάση του λεωφορείου.

- "Σερβέσα-μπίαρ! Σερβέσα-μπίαρ αμίγκο!"

Καθώς πάσχουμε από έλλειψη περιπτέρων και 24ωρων ψιλικατζήδικων της γειτονιάς, είναι ιδιαίτερα συνηθισμένο φαινόμενο εδώ, μετανάστες να πουλούν μπύρες στο δρόμο. Σε κάθε στενάκι του κέντρου, Πακιστανοί, Ινδοί, Άραβες και άλλοι διάφοροι μετανάστες, στέκονται με μια εξάδα "εστρέλλα" στο χέρι και φωνάζουν στους περαστικούς.

Πλησιάζω με τη συγκάτοικό μου η οποία προσπαθεί να κάνει παζάρι και να πάρει δύο κουτάκια στην τιμή του ενός. Η διεθνής λέξη "μαλάκα" ξεφεύγει από το στόμα μου και τα μάτια του πλανόδιου αστράφτουν.

- "Είστε Έλληνες;", ρωτάει σε σπαστά ελληνικά και χαμογελάει.

Πιάνουμε αμέσως κουβέντα σε μια περίεργη ελληνο-ισπανο-αγγλική διάλεκτο. Ζούσε και δούλευε Ελλάδα παλιά μου λέει. Πριν από δύο χρόνια. Μιλάει αρκετά καλά ελληνικά και φαίνεται ιδιαίτερα χαρούμενος που συζητά μαζί μου. Τον ρωτάω γιατί έφυγε και αν του φαίνεται καλύτερη η ζωή στην Ισπανία.

- "Ελλάδα πολύ ωραία. Ισπάνία shit.", μου λέει κατά λέξη και τον κοιτάζω κατάπληκτη. Πού ακριβώς ζούσε αυτός ο άνθρωπος; Σίγουρα μιλάει για Ελλάδα;

Αφού μιλήσαμε λίγο ακόμα, κατάλαβα γιατί το είπε. Ζούσε και δούλευε σε κάποιο χωριό στη Λακωνία. Όμως εγώ, για κάποιον (ανόητο;) λόγο υπέθεσα ότι με το Ελλάδα εννόησε Αθήνα. Και όχι άδικα ίσως, αφού ο μισός και παραπάνω πληθυσμός της χώρας έχει μαζευτεί σ'αυτή την πόλη.

- ~ -

Και πάλι σήμερα διαβάζω νέα, βλέπω τα βίντεο με τους ξυλοδαρμούς και τα επεισόδια και φρικάρω. Ένας φόνος προχθές και μετά το χάος.

Σύγχυση, θυμός και αηδία. 

Ο φάυλος κύκλος μίσους μεταξύ διαδηλωτών και αστυνομίας.

Ο φαύλος κύκλος μίσους ακροδεξιών και μεταναστών.

Εικόνες παραπάνω από οικείες πια. Εικόνες που είχαν γίνει καθημερηνότητά μου όταν ζούσα Αθήνα αλλά εδώ και λίγο καιρό είχα ξεχάσει.

Δεν ξέρω αν πρέπει να ντραπώ, να στεναχωρηθώ ή να φωνάξω.

Προσπαθώ να σκεφτώ τι μπορεί να γίνει. Πώς μπορεί να γίνει. Και για άλλη μια φορά καταλήγω στο ίδιο (ακραίο;) συμπέρασμα: 

(Συγχωρείστε μου την παρακάτω συσχέτιση, αλλά δεν μπορούσα παρά να το σκεφτώ έτσι)

 

Η ζωή στην Αθήνα είναι σαν παλιό λογισμικό γραμμένο σε Fortran.

 

Του κάναμε ένα μικρό "συμμάζεμα-update" το 2004 για τους Ολυμπιακούς και από τότε το έχουμε αφήσει στη μοίρα του.

Κανείς δεν καταλαβαίνει απόλυτα πώς λειτουργεί και κανείς δεν μπορεί να το ελέγξει πια.

Είναι γεμάτο "bugs" τα οποία προσπαθούμε να καλύπτουμε πρόχειρα και τσαπατσούλικα. 

Με κώδικα τόσο δυσνόητο που φοβάσαι να τον πειράξεις μήπως και σταματήσει να λειτουργεί.

Κι όσοι θαρραλέοι επιχείρησαν να βελτιώσουν κάτι, συνήθως κατέστρεψαν κάτι άλλο.

Αυτοί που το έστησαν έχουν φύγει πια και έχουν αφήσει πίσω κάτι χωρίς δομή και χωρίς συνοχή.

Κώδικα που χτίστηκε αυξητικά με το μοντέλο του καταρράκτη.

Ένα σύστημα που έχει φτάσει σε κορεσμό. Σύστημα που δεν κάνει scale. 

Κι αυτό σημαίνει ότι δεν μπορέι να εξυπηρετήσει άλλα requests.

Κι όταν το σύστημα δεν είναι scalable, δύο είναι οι επιλογές:

Είτε θα αλλάξουμε πλατφόρμα και θα το ξαναχτίσουμε με άλλη λογική, είτε οι δυσαρεστημένοι "πελάτες" μας θα μας εγκαταλείψουν.

Dsc_0870
*φωτό από: http://athens.indymedia.org/front.php3?lang=el&article_id=942298

 

Όπως και να έχει, υπάρχει ένα τίμημα. 

Είναι ακριβό, αλλά κάποτε και κάπως θα πρέπει να το πληρώσουμε.

B.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1090350/_sagrada_familia.jpeg http://posterous.com/users/hdKYUkxjXnbgC vasia vasia
Wed, 27 Apr 2011 07:37:00 -0700 wxErlang: Gathering the pieces http://vasia.posterous.com/wxerlang-gathering-all-the-pieces http://vasia.posterous.com/wxerlang-gathering-all-the-pieces

It's been quite some time that I started rewriting some Erlang GS gui modules to use wxErlang, since the intention is that wxErlang shall replace GS soon, maybe in the next major release (R15).

Since it was not something urgent, I've been working on it every now and then, making new discoveries every time. So, I decided to write this post to gather all sources of information I came across, tutorials and existing examples and share my experiences and difficulties I found.

 

What is wxErlang?

wxErlang is an Erlang binding to the C++ GUI library wxWidgetswhich provides support for cross platform GUI applications. This report can give you an insight of how it can be used, as well as how the library itself has been implemented. The report is quite old (2005) and several things have changed since it was written, but it's worth reading. It explains the basics in a fairly simple way and gives an interesting example on how etop (an Erlang version of the unix command top) would be re-implemented using wxErlang. However, be sure not to copy-paste the code provided and always advise the documentation because some things have changed since the report and won't work!

For example, wx:start() is used in the report instead of wx:new(), context instead of environment, etc.

 

Tutorials and References

I have to admit that when I fisrt started messing up with wxErlang, I was desperate. I could find almost nothing helpful online except from the reference manual, which is actually a collection of the available modules and functions with "See external documentation" links to the wxWidgets documentation.

But then this great tutorial organized a bit the things in my mind. It's fairly simple and complete and has step-by-step examples. As far as I have searched online, it is the only one available (please let me know if there are more I'm not aware of).

In a section of the same tutorial, wxFormBuilder is introduced. wxFormBuilder is a tool which allows the user to design the gui in a drag-and-drop way and generates C++ and Python code automatically. Since no IDE or other kind of framework is available for building a gui in Erlang in an easier way, I thought I'd use it. The truth is it's great but I came to find out that "translating" C++ or Python to Erlang was a bit more tricky than it sounded. I guess if someone is familiar with all three languages, ti might be very helpful.

Links

 

Some things to note

Now to conclude this post, there are some hints I want to mention. Things that might not be obvious or in my opinion deserve your special attention:

 

1. Don't forget to destroy your object

Even if when using wxErlang we're writing Erlang, we're indirectly writing in C++ and we have to be aware of that. Especially when it comes to memory management. With wxErlang we need to care about destroying the objects we created and there are destroy/1 functions in every widget to do so.

 

2. Always read the external documentation

I know it's annoying but since the reference manual is not very enlightening and especially if you haven't worked with wxWidgets before, you might find yourself in situations where you get errors (in the usual Erlang-non-self-explanatory way) and you have no idea what is wrong.

To give you an example, here is a function I wrote: 

 

make_window() ->

    Server = wx:new(),

    Frame = wxFrame:new(Server, -1, "My Window", [{size,{300, 200}}]),

    Panel  = wxPanel:new(Frame),

    Box = wxStaticBox:new(Panel, -1, "A Static Box"),

    MyText = wxStaticText:new(Box, -1, "This is static text"),

    wxFrame:show(Frame),

    ok.

 

When executing this function, the Erlang shell crashes with "Segmantation fault", no additional information, no line provided. The problem with this code is that wxStaticBox is used as the parent object for the wxStaticText. However, you will not find the reason in the wxErlang documentation, but it's pretty clear in the external one:

"Please note that a static box should not be used as the parent for the controls it contains, instead they should be siblings of each other. Although using a static box as a parent might work in some versions of wxWidgets, it results in a crash under, for example, wxGTK."


3. Concurrency

Of course what we all love in Erlang is concurrency. Why even bother making a gui in Erlang if you can't have the different components acting as independent processes that communicate with each other and the failure of one of them doesn't affect the rest of your gui?

However, there is a small detail that requires our attention when dealing with wxErlang. wxWidgets uses a process specific environment, which is created by wx:new/0. So, each process is defined within an environment. In order for other processes to be able to communicate with it, they have to share the same environment. To be able to use the environment from other processes, we can call get_env/0 to retrieve the environment and set_env/1 to assign the environment in the other process.

 

Looking forward to a tool that will make our life easier,

V.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1090350/_sagrada_familia.jpeg http://posterous.com/users/hdKYUkxjXnbgC vasia vasia
Fri, 22 Apr 2011 09:53:00 -0700 Athens Metro Punctuality http://vasia.posterous.com/athens-metro-punctuality http://vasia.posterous.com/athens-metro-punctuality

I've always liked Athens Metro. It's new (more or less 10 years old), it's clean, it's fast. People like it, use it and respect it. Another thing I like is that the archaeological artefacts dicovered during the excavations are exhibited in many stations for the public to see, thus turning stations like Monastiraki and Panepistimio into small museums (more info and some pictures are available here).

I won't comment on the recent ticket price increase. I will just state the fact that when I first moved in Athens in 2004, the ticket cost 0,80€ (0,40€ for students) and today it costs 1,40€. And that since 2004 until today 3 or 4 new stations have been added.

Anyway, last week I used the metro after several months. And there I am, Saturday night in the crowded station of Syntagma. As I was a bit late (influenced by the Spanish sense of time or maybe this is just the excuse I used later :p), I rushed down the stairs to check the sign on the platform: 1 min for the metro to come. Just perfect!

I pressed Suffle on my iPod and smiled as "Pluvius Aestivus" started playing. I waited enjoying the song and watching more and more people filling the platform.

After the song finished, the metro hadn't come yet. I was sure the song lasted more than a minute. I raised my eyes to the 1 min sign. It said 30 seconds. People kept coming and coming... Long story short, the metro came somewhere in the middle of Russian Circles' "Enter". But it was so full I couldn't "enter". I finally managed to board in the next train, just as "Post Blue" by Placebo started playing.

So, here it is, tonight's playlist provided by Athens Metro.

Enjoy:


 

 

I should have walked...

V.

 

 

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1090350/_sagrada_familia.jpeg http://posterous.com/users/hdKYUkxjXnbgC vasia vasia
Thu, 14 Apr 2011 10:31:00 -0700 Όπου φύγει-φύγει... http://vasia.posterous.com/49562119 http://vasia.posterous.com/49562119

8 μήνες μετά

Πάνε 8 μήνες που έχω φύγει από Ελλάδα και ζω στη Βαρκελώνη. Γύρισα χθες λοιπόν για Πάσχα. Είχα ξαναέρθει τα Χριστούγεννα. Όμως αυτή τη φορά είναι αλλιώς. Βλέπω ήδη τη διαφορά. Είμαι εδώ μια μέρα και με έχει πιάσει ένας κόμπος στο στομάχι, μια κατάθλιψη, γίνεται ένα πάρτυ από σκέψεις μέσα στο μυαλό μου.

Κάποιος πυροβόλησε ελεγκτή, φοιτητές επιτίθενται στον Watson. Τρελές ειδήσεις σκάνε η μία μετά την αλλη και δεν ξέρω πώς να τις επεξεργαστώ. 

Και τελικά το αποφάσισα:

Δεν πάει άλλο. Σήμερα πρέπει να αρχίσω να γράφω.

Από τη μία νιώθω τυχερή που δεν ζω εδώ. Τυχερή που βρήκα τρόπο να την κάνω γρήγορα, ανώδυνα και με ελαφρά. Από την άλλη νιώθω ένοχη που βλέπω και συνειδητοποιώ τι γίνεται και διάλεξα να αποδράσω. Ναι, αρνούμαι να ζήσω σ'αυτή τη χώρα έτσι όπως είναι και το ξέρω πώς αυτό είναι εγωιστικό. Με πληγώνει αλλά το λέω και θα συνεχίσω να το λέω: Αν μπορείτε, φύγετε. Για οπουδήποτε. Για οσοδήποτε.

 

Final Stage: Acceptance

Νομίζω πώς η Ελλάδα έχει περάσει ήδη στο στάδιο της αποδοχής. Έχουμε κρίση. Το ξέρουμε. Το αποδεχόμαστε. Είναι πλέον δεδομένο. Βέβαια και πριν φύγω, πάλι κρίση είχαμε. Αλλά δεν θέλαμε να το πιστέψουμε. Ήμασταν στο πρώτο στάδιο, αυτό του Denial:

- "Κρίση; Τι κρίση; Όλο γκρίνια είμαστε οι Έλληνες πια, την καταστροφή φέρνουμε. Μια χαρά θα πάνε όλα, δεν χρειαζόμαστε βοήθεια."

- "Τι; Θα μεταφέρεις τα χρήματά σου στο εξωτερικό; Μα γιατί; Μην είσαι χαζή, δεν υπάρχει περίπτωση πτώχευσης."

Ίσως μερικοί ακόμα δεν θέλουν να το πιστέψουν. Αλλά όλο και λιγοστεύουν. Και το ότι είμαστε πλέον στο στάδιο της αποδοχής είναι ιδιαίτερα εμφανές στη στάση της οικογένειας και των φίλων όσων από εμάς φύγαμε.

Στην αρχή ήταν:

- "Ναι παππού. Θα πάει το παιδί να σπουδάσει έξω. Και σε δύο χρόνια θα γυρίσει πίσω να πιάσει δουλειά εδώ." 

Μετά την πρώτη επίσκεψη ήταν:

- "Ναι παππού. Καλά περνάει το παιδί εκεί, είναι καλύτερα από εδώ. Πότε θα γυρίσει; Εεε, θα δούμε. Σύντομα."

Τώρα το ξέρουν, ότι δεν θα ξαναγυρίσουμε. Τουλάχιστον όχι στο άμεσο μελλον:

- "Ναι παππού. Καλά είναι το παιδί. Πότε θα γυρίσει; Και τι να κάνει να γυρίσει στην Ελλάδα παππού;"

Img_0252

Επίλογος

Μερικές φορές όταν συζητάω με φίλους, ίσως φαίνομαι κατενθουσιασμένη με την Ισπανία και δεν βλέπω τα στραβά τους κι εκεί. Ίσως υπερεκτιμώ κάποια πράγματα ή υποτιμώ τα πράγματα εδώ. Δεν διαφωνώ. Μπορεί να είναι αλήθεια. Άλλωστε δεν μπορώ να μην παρατηρήσω την αναλογία των πραγμάτων. 

Το όπου φύγει-φύγει δεν είναι φαινόμενο Ελληνικό.

Και οι Ισπανοί ψάχνουν διεξόδους και οι υπόλοιποι Ευρωπαίοι φοιτητές. Λονδίνο, Σουηδία, Η.Π.Α... (Έχουν ιδιαίτερη αγάπη στις Η.Π.Α. και στα πανεπιστήμιά τους. Οι ευρωπαίοι φοιτητές φεύγουν για Αμερική και η Ευρώπη ψάχνει φοιτητές στην Ασία... Αλλά αυτό είναι άλλη συζήτηση που απαιτεί από μόνη της ένα post.)

Σκοπός μου δεν είναι να σας πείσω πόσο πιο ωραία είναι έξω, ούτε να σας πω για ακόμα μια φορά πόσο σκατά είναι τα πράγματα εδώ. Αυτό το ξέρετε καλύτερα από εμένα όσοι μένετε εδώ. Σκοπός μου είναι να πω όσα θέλω να φωνάξω, όσα θέλω να ακούσουν όσοι κοιμούνται εκεί έξω.

Και θα κλείσω το πρώτο μου post, παραθέτοντας μια ατάκα που μου είπε ένας φίλος Βαρκελωνέζος πριν μερικές μέρες και τη θυμήθηκα διαβάζοντας τις σημερινές ειδήσεις:

"Ευτυχώς που δεν πέρασε και η άλλη Ελληνική ομάδα (ο Ολυμπιακός) στο final 4 της Βαρκελώνης. Δε θέλουμε να μας καταστρέψετε την πόλη."

Τα σχόλια δικά σας...

Β.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1090350/_sagrada_familia.jpeg http://posterous.com/users/hdKYUkxjXnbgC vasia vasia