Note: If you have some sort of an interview for a farm analytics company. Study this whole post over and over again, and bring up your knowledge in the interview.
As you know, Machine Learning like a Zombie infestation. Once a few algorithms hit your industry, many a things will never be the same. So, let's talk about the impact Machine Learning has had on the farming/agriculture sector. The Agriculture industry is being revolutionized by data scientists. Here’s a cool vid on it to get started.
We'll talk about the technology, then the data science aspect (useful for interviews). This'll be followed by some actual commentary from some of the farmers in the jungle. Thanks to rancher and farmer for their help:
There’s more than these, but if you have a data science interview with a farm analytics company coming up. It’s a good idea to brush on these 3 being used in the real world today.
Table of Contents:
Precision Farming
Livestock Care
Self Driving Tractors
1 - Precision Farming
The Technology
Machine Learning can be used to analyze the health of the crop, and the soil it’s on. We use this to predict, and maximize the crop yields of the farmers. For soil quality, some of the data this looks at is:
Water Usage
Fertilizer Amount
Pesticide Treatment
We have remote sensing drones that fly around and collect data within the actual field. These drones focus within the nutrient quality on the soil, and the crop quality itself. Some of the things they study are:
Nitrogen & Phosphorous concentration
Disease/Pest Outbreaks
Productivity Hotspots
Here’s a few of the drones in action:
The Data Science
Since this is a data science substack, let’s focus in on the actual data science behind the technology. We’ll do a deep dive on how we can predict crop disease outbreaks.
Objective:
To start with, we can label every crop as a class:
0 - Healthy
1 - Guaranteed Disease Carrier
The guarantee part is important, because this is effectively a classification problem. The goal is to generate a model which predicts the probability of a crop carrying a disease.
Data Collection:
We use things like satellite imagery, remote controlled drones, temperature sensors, and pretty much anything else that can collect data on plants, and the soil. Some of the factors collected are:
Farm & Population Data
Climate Conditions - Temperature, Humidity, Rainfall
Soil Type
Health on previous Harvests
Previous fungal, and pest growth
We want data on the previous infections because this will help create a normalized value. In other words, we want to know what is considered “normal” first, and then try to predict extremities.
Data Modelling:
Since this is not structured data problem (if a crop gets infected, it’ll infect it’s neighbors soon). For this specific problem, the Machine Learning model used is a KNN Classifier.
Commentary from Farmers
BowTiedFarmer: I’m interested in how a drone can tell plant health and soil quality. How long it would need to fly and how close to the plants it would need to fly. Some of the costs would be interesting as well.
2 - Livestock Care
ML models are used to detect issues in livestock, before they grow into serious problems. The algorithms study things like:
Activity Levels
Heat Generated
Video Surveillance Feeds
This ends up being useful because you help detect problems for your livestock. This means you can take action on that specific animal, and not waste time worrying about the rest.
Since this algorithm studies surveillance feeds, you can forward the “normal” behavior of the animal. Then follow it up with the “odd” behavior. So, the farmer knows his time isn’t being wasted when he goes and checks on the animal.
The Data Science
Objective:
The goal is to predict which animal is no longer following it’s “normalized” behavior. In other words, the further it goes from it’s normal behavior, the more likely that something is wrong.
In other words, this is an Anomaly Detection problem. Here’s a vid on Anomaly Detection by Andrew NG. He’s also involved in a Data Centrism campaign.
Data Collection:
We have many different animals we can look at: Pigs, Cows, Horses, etc… We’ll focus on the cow example here though. The data will be collected in 2 ways:
Sensors
Food intake
Milk Yield
Body Temperature
Sweat Signal
2nd way: Surveillance Feed
Activity Levels
Path travelled
Data Modelling:
For all the data collected, we will want a timestamp. This timestamp will be generated from every sensor, and the surveillance feeds. Once we’ve identified the factors, we can do some modelling on them to get the most important timestamps.
Then, we can sync the timestamps, with the factors, and the surveillance video, and pass this off to the farmer.
Note: The model used for anomaly detection is the same for detecting fraud in financial statements.
Commentary from Farmers
Rancher: I’d be more interested in looking at the details on how the this AI works. For Cattle, this will be easier in a feedlot, or dairy setting. Depending on drones in a feeder/background/calf operation makes tracking individual animals much harder
3 - Self Driving Tractors
These things are real. Here’s a vid of one in action.
This is like Tesla, except instead of driving on the road, it drives around on the farm. Imagine you are a farmer, you open up an app. You tell the tractor to navigate to the field specified, and tell it where & how to harvest.
You now have that time freed up to do something else.
Not only does the tractor drive, it also has sensors on it which is recording the quality of the soil. It then passes that information to the farmer in real time. The sensors are also used to collect data on the surroundings. So, that the tractor doesn’t go off the track, and is prepared for any weird abnormalities.
The Data Science
Objective:
The goal is to have sensors collect, and feed real time data to the Neural Network. The neural network then needs to figure out what it’s next move should be:
Continue as normal on the required path
Stop & Change Directions
Reverse
In order words, we want to build a neural network that is told to finish the required path as soon as possible. The faster it does it, the more points (better loss fn) it gets.
If it fks up, then it loses all the points. There’s only 1 kind of neural network that can handle this. A Recurrent Neural Network (RNN). Although used for Natural Language Processing (
), these are being used quite a lot for games, and modern AI. Here's a vid on them.What you see on the right is a simple example of what the AI on the tractor would see.
Data Collection:
The data has to be collected by the precise navigation GPS, and the sensors around the tractor. Both need to be functional without errors. Luckily, that’s not the Data Scientist’s problem, that’s the engineers & and the farmer’s problem.
Data Modelling:
In this case, the neural network needs to be predicting on Realtime data it encounters. We cannot collect a bunch of data into a table, and then feed that to the ML algorithm. It needs to never stop predicting.
The most common way to do ML model predictions is to do them in batch. We collect a bunch of data, send it to a model to predict, and then move on from there. In this example, we do the opposite, the ML model needs to constantly predict new information without stopping.
Here’s a cool vid on them from Amazon Web Services:
Commentary from Farmers
BowTiedRancher: The real fun is with machinery automation ... Every major manufacturer has their own take on it, with variations from 'old school visual "Crop GPS" in the cab' to 'higher tech with self driving while you sit and monitor'.
Even at the lower end (my new tractor is a smaller 75hp model) you can get these automation tools if you want. Pretty crazy, really.
BowTiedFarmer: That part where you talked about the GPS able to adjust the amount of fertilizer or spray based on location in the field from previous soil tests is an amazing idea
Nice article sir!