Who am I?

Hi. My name is Mikael Manukyan (you can call me Michael or just Mike). I am a student from Armenia. Right now, I am in the last year of Russian Armenian University doing my master’s degree in Computer Science. Also, I have almost 3 years of experience in Web Development using Node.js and I am CTO of some local software development company.

My fields of interest are Coding, Machine Learning, Neural Networks, Computer Vision and Natural Language Processing. I am a part of small team of highly motivated students who are passionate about Neural Networks. We are trying to find our place in the enormously fast growing field of Neural Networks. Our team is called YerevaNN. And our latest work is an implementation of Dynamic Memory Networks.

I am happy to take part of this wonderful program which Google organize each year. A company for which I will work during the summer is scrapinghub and the project is Splash.

Why Splash?

So, let’s explain to you why I chose this particular project. First for all, what is Splash?

According to its documentation, Splash is a lightweight, scriptable headless browser with an HTTP API. It is used to:

  • properly render web pages that use JavaScript
  • interact with them
  • get detailed information about requests/responses initiated by a web page
  • apply Adblock Plus filters
  • take screenshots of the crawled websites as they are seen in a browser

But it isn’t the reason why I chose it, the main reason is that Splash consists from:

  • Qt - for web page rendering
  • Lua - for interaction with web pages
  • Python - to glue everything together

The variety of different programming languages and their interrelation is the main aspect why I thought: “This is a project on which I will love to work!”.

Splash has a feature to interact with web pages using Lua scripts. Therefore, scripting is an experimental feature for now and it has a lack of necessary functions. And making Splash scripts more practical and useful is my main work for the summer.

What I will do for Splash?

There are three main features/modules that I am going to add to Splash:

  1. In Lua scripts ability to control the flow of commands execution (particularly, splash#go) using new API splash#wait_for_event
  2. In Lua scripts ability to interact with DOM elements using new API splash#select
  3. User plugins support

splash#wait_for_event

The current implementation of the splash#go method returns the control to the main program only when the page, which is currently loading, returns loadFinished signal. Signals are a part of Qt which are used for communication between various modules. For this particular case, signals are used to notify that page, e.g., has finished loading or some error has occurred during its load. The current behavior doesn’t allow to do something when the page hasn’t been fully loaded (e.g. there are some resources on the page that took very long time to load). I am going to add a new method which will allow to catch various type of signals and along with that a new parameter for splash#go event which will return the control back to the Lua translator right after its execution, without waiting for the page load.

This method will allow to control the flow not only for splash#go but for the all other methods which depends on signals.

splash#select

Currently, in the Splash scripting there is no convenient way to click on the element, fill the input, scroll to the element, etc. This method will find the specified DOM element and return the instance of Element class. This class will manipulate with DOM elements in Lua.

Adding new utility functions is the main part of my summer work. I decided that adding utility functions on the splash object (like splash#click) is not as good as adding utility functions to some class (like element = splash:select() and then element:click()).

User plugins

During my project exploration, I noticed on TODO comment.

TODO: users should be able to expose their own plugins

And I thought: “why not to add user plugins support?”.

It is going to work the following way. If the user wants to add her own plugins, she specifies --plugins /path/to/plugins argument when starts Splash server. Plugins folder should contain two subfolders: lua and python. For Lua and Python files respectively. Lua folder is used to load custom Lua modules. Python folder contains a list of python classes. This classes should be inherited from SplashScriptingPluginObject class which will allow to user to load Lua modules.

What I did during Community Bonding Period?

As I mentioned in the begging of the post I am graduating this year, so during Community Bonding Period I was working on my final exams preparation and my final project. Hence, I got very little time for GSOC. However, I did quite a big work exploring the inner structure for Splash before my GSOC acceptance. During my first PR I understood how the different parts of Splash are communicating with each other, how the tests are implemented and how the docs are written. Alongside, I tried to fix some active issues. Unfortunately, I didn’t manage to do it, however, when I tried to find the cause of bugs I dig very deep into Splash implementation.

Also, I really thankful to my mentor who understood my current situation and allowed me to focus on my education.

What now?

I think, this summer will be the productive one and I don’t miss any my deadlines. Wish to all GSOC students the same.


Have fun and code :wink: