A Superior Similarity Metric
Matching Farmer Fields with Research Plots
A Carlson Analytics Lab Client Project
Winfield United, the crop sciences division of Land O’ Lakes, operates an extensive research program known as Answer Plot. Nearly 200 test fields across the United States are part of the Answer Plot program. Conducting tests of seed types, crop nutrients, and protection inputs on the Answer Plot fields, Winfield collects performance data it uses to advise member farmers.
Originally, Winfield agronomists made recommendations to farmers based on data from the Answer Plot that was geographically nearest to the farmer. However, growing conditions may differ drastically between a farmer’s field and an Answer Plot even a short distance away. As a results, Winfield’s target market of farm owners began to express reservations about the credibility of the Answer Plot program.
In order to provide more accurate, more credible advice to farmers, Winfield United set out to improve the program by identifying the best representative Answer Plot in terms of growing conditions, not just geographical distance.
Harvesting Growing Conditions to Develop a Similarity Metric
Comparing similarity between Answer Plot fields and farm fields required a set of common attributes. Students in the Carlson Analytics Lab collected a range of data on growing conditions from a variety of public datasets: USDA crop layer data, SSURGO soil data from the USDA and National Resource Conservation Service (NRCS), weather observation data from the National Oceanographic and Atmospheric Administration (NOAA), and topographical data from NASA satellite imagery.
The students conducted extensive data engineering to integrate these datasets and construct attributes for various growing conditions. For example, the team transformed descriptive soil types to numeric values by considering composition and texture in different soil layers. For weather conditions, the team used triangulation with inversed distance to analyze data from the three closest weather stations for each field.
After constructing these attributes for all Answer Plots and for 300,000 fields in Minnesota and Wisconsin, the students utilized Euclidean distance to develop a similarity metric based on growing conditions. Because different conditions have different impact on crop growth, the students consulted with domain experts to evaluate the importance of each growing condition and assign weights. The refined model identified similarity between an Answer Plot and a given farm as an easy-to-understand percentage.
Using the growing condition-based similarity metric, the Carlson Analytics Lab team identified the best representative Answer Plot for all 300,000 farm fields in Minnesota and Wisconsin, improving results by 7% on average. Furthermore, the students identified low-performing areas underrepresented by the current Answer Plot portfolio, which will help Land O’Lakes optimize future field allocation. The work also provides a foundation for ongoing Land O’Lakes efforts to understand environmental sustainability and crop performance.