Data Fusion for crop yield prediction

Milica Brkić

Junior Researcher, BioSense Institute

Today’s data is more easily generated thanks to numerous smart devices. Gathering data from multiple data sources, combining and exploring it is particularly important in crop yield prediction as many variables (soil properties, weather conditions, land management…) impact the final outcome. Information produced in the process of data fusion is quite often more useful than the one provided by any individual data source.

A team of researchers from BioSense Institute is working with a crop yield prediction model based on data fusion. The algorithm called Data Fusion by Matrix Factorization (DFMF) is used to help seed industries breed better seeds. Not all varieties are suitable for all fields. Seed selection is very important, as it can lower the costs and increase the yield. Testing hybrids in various conditions helps us understand for which land and conditions they are suitable to produce high and quality yield. But testing them in diverse scenarios is impossible due to experimental costs, time costs and the limited number of locations on which breeders can plant hybrids.

Figure 1: Data fusion configuration

The data that was analysed comes from Syngenta Crop Challenge 2019, and it is one of the largest and most comprehensive publicly available data for research in crop yield forecasting. The dataset contains the maize yield values for different hybrids, grown across various environments/fields. Soil and weather data is provided for every environment. The goal of our work was to predict the performance of every hybrid on every environment using the DFMF algorithm. Enriching the historical dataset helps us see based on yield, which is one of best indicators in smart seed selection, what hybrid is best to be planted in a particular location. The results that were obtained are promising. Having in mind that predicting maize yield is a very challenging task as there is high variability. Figure 2 shows yield variability only for the year 2015, where the minimum yield was 25,15 and maximum was 198,25 quintiles per hectare.

Figure 2: Histogram of the yield for the year 2015

Research in this area is very important as optimizing our crops is a necessary challenge for our future. Algorithms such as the one mentioned above can help us overcome this challenge.