Innova-tsn at ‘The R Users Conferences’
23rd – 25th November 2022, Córdoba
Last November, our colleagues , Julián Rojo, Begoña Vega, María Neira y Ángela Díaz, attended the 1st Congress and 12th Conferences on R Users, which took place in Córdoba. Innova-tsn started sponsoring this event in 2016 and in this edition we had the pleasure to do it again, for the fifth year. Our Silver Sponsorship was used to facilitate the two prizes that were awarded: Student Awards and Young Researcher Awards.
This annual event, which had not been held for two editions, focuses on the use of the statistical software R in operational research aimed at professionals and students who use R in their day-to-day life. These R conferences are held nationally with the goal of providing a meeting point for these R users, promoting collaboration between them in a multidisciplinary environment and with the premise of disseminating knowledge of this language and its possibilities.
The organisers, the R Hispano Community, an association set up in 2011 with the aim of advancing the knowledge and use of the R programming language, hold these conferences in collaboration with each of the universities in the cities where they are held.
Not only did our colleagues attend the conferences as sponsors of the event, but they also presented a practical case!
The case Innova-tsn shared in these conferences, “UP IN THE AIR” was about predicting the number of passengers on Madrid-Barcelona air shuttle flights. The complexity of making the prediction was that the occupancy of these flights is extremely variable, as in addition to passengers who have made a prior reservation of the ticket, the seats are complemented by passengers who make changes from one flight to another, as these are flexible fares, or by those who buy a ticket shortly before the flight takes off. Additionally, there were required different prediction horizons, which involved calculating around 8,400 daily predictions corresponding to the airline’s annual flights.
In order to make these predictions we have a four-year history with variables related to the flight, passengers on equivalent flights, as well as the previous bookings of each flight. Processing this information allows us to generate a board with 175 variables. Through several ensemble models available at the R libraries (KNN, K-Means, lineal regression and SVM), different predictions are estimated which serve as an input for the final model, an XGBoost, which determines the final prediction. In addition, there is a series of flights classified as critical, where the prediction cannot be below the actual final number of passengers requiring an overestimation. In these, a quantile regression is performed, taking the 90th quantile as the value.
Thanks to these processes, the prediction error was reduced by up to 60% in the 120-day horizon, reducing also the dedication times, going from having a person dedicated exclusively to this task to having a process that generates and sends these predictions autonomously in 4 hours. The presentation aroused much interest among the attendees.
How did some of our colleagues experience the Conferences?
The experience in the R sessions this 2022 has been very gratifying. The talks were interesting since they make us aware of other sectors, such as agriculture, where analytical techniques with R are currently being applied and are intended to be improved. This, in addition to being able to see the need to use data and analytics in virtually all sectors, to improve both the functioning of a specific business and making our daily lives easier. On the other hand, this will bring us closer to sharing with other professionals and students their way of working with R and a great variety of real projects carried out.
One of the talks I liked the most was Emily Robinson’s “Don’t Let the Snake Bite: Integrating Python into Your R Workflow,” which focused on the importance and need to use the combination of other languages (in this case, Python) along with R to get the most out of our work.
As in previous years, the R Conferences have been very enriching. The conferences allow us to connect the academic world with the professional world and get to know different sectors a little more closely.
In my case, one of the talks that had the most impact on me was the plenary session of Eli Vivas’s (CEO of Storydata) “There is a Hacker in the Newsroom,” in which she highlighted the importance of how to transmit the knowledge acquired with the data and the impact of mixed profiles of data scientists with great communicators so that we can create impact with the data.
We would like to emphasise once again how important it is for Innova-tsn to participate in this type of initiatives that combine the exchange of knowledge and the sharing of ideas, from a fun and continuous learning perspective, which undoubtedly represent a professional and personal enrichment for the participants.
Finally, we would like to thank the R Hispano Community for the good organisation of this event, which would not be possible without the selfless collaboration of the members of the Organising Committee (in charge of logistics, etc.) and the Academic Committee (which manages the presentations, workshops, etc.). Without a doubt, all of them strive to keep R as a positive, successful and growing community. Thank you everybody!