Big Data, or Big Surface area?

I know you’re probably tired of reading about big data. Well, here’s the thing—it’s not going away. The question is whether you’re prepared for the wave, or whether the wave submerges you.
Part of the problem that people seem to have in wrestling with the term big data is figuring out exactly where it applies. The stock answer of “everywhere,” isn’t that helpful. So let’s explore a specific application of big data to a specific problem.
We’re going to step out of the supply chain world for a moment, so bear with me—I promise I’ll bring this back to your day jobs.
In early April, I met Ted Willich, chief executive officer at Jacksonville, Fla.-based NLP Logix. NLP provides predictive modeling services to a range of industries, and he was eager to talk to me about specific projects his company had done. Among other things, NLP has built a model to help the passenger airline industry predict how long flights actually are, and is currently helping the gaming industry better determine the odds of horse races.
Pretty cool stuff, and universally understood concepts. So I spoke to Willich and Matt Berseth, the company’s lead data scientist, about where this stuff is going.
Berseth said the airline project, called FlightQuest, was conducted for a competition on the predictive modeling platform Kaggle, a site that uses crowdsourcing to engage statisticians and data miners from all over the world to compete to produce the best models.
Berseth’s model was the top one in North America.
“I think the thing that is novel for this competition was the data sets we were bringing together,” he said. “Some of it was structured data, like the schedule, but on top of that, and where the complexity was, okay this airport literally has a website that has HTML that has to be scraped for weather conditions. And then you have flight plans that change 50 times while in the air. What does that all mean in terms of duration of flight?”
The goal of the model was to better predict the actual time a flight was in the air and more accurately predict the gate-to-gate time for any given flight. With 2,000 flights in the air within North America at any given time between 9 a.m. and 9 p.m., the outcomes were critical for airlines, which had been building more time into their schedules than necessary for some flights, and not enough for others.
“The database I curated and used for this model was approaching a terabyte in size,” Berseth said. “It felt like we were working on a larger scale, because a lot of modeling tools aren’t designed to work with that much data.”
The model resulted in an improvement of 40 percent in the accuracy of predicting flight duration.
And here’s where it’s really critical to start understanding what big data means. That 40 percent increase in accuracy is the value—for lack of a more precise term, the return on investment. It’s easy to see an output like that and say, “ah yes, now I see why this is important.”
But it doesn’t necessarily help a transportation manager get his or her arms around the issue, does it? Knowing where you need to be is nice, but knowing what it takes to get there is much harder. So Berseth has some advice.
“Think of a model as having a certain surface area,” he said.
That means the number of variables that a model considers.
“Having a big surface area is good and bad,” he said. “There’s a tradeoff in terms of the cost of collecting lots of variables and monitoring those variables, versus having a simpler model that’s easier to maintain. It’s not easy to make those decisions.”
We’ve written before about the technologies that are enabling transportation and end-to-end network modeling on a scale like never before. But the models need to be managed.
“Modeling is hard, but it’s hard because you almost have to make mistakes to learn,” he said. “The field has been around forever, but it’s new in terms of being applied with the mass that it is now.”
NLP sees a couple things holding companies back from using predictive analytics on a broader basis. For one, companies need to see a value from the exercise. For airlines, the value was in better knowing how long their assets would actually be up in the air. As previously mentioned, NLP is also modeling how to better calculate odds on a horse race. For instance, if a horse has a 20 percent chance of winning a race, but the horse is being paid out as if he has a 10 percent chance of winning, that’s an edge.
Of course, the edge might actually be only a fraction of a percent, but in the volume play that is betting, that makes a big difference over a long period of time. Doesn’t that sound like transportation and logistics? Improving the predicted transit time of a single shipment by 1 percent sounds marginal on that single shipment, but if it’s improved by 1 percent for 100,000 shipments, that starts to become compelling, and valuable.
The other area NLP focuses on is building models that actually get delivered into companies’ workflows, as Willich put it. Models in and of themselves are interesting academic exercises, but they become practically important when the results can be put into a production environment.
So the three areas to focus on the next time you get pitched on a big data/analytics/business intelligence project: what’s the surface area of the model, what’s the value to your company, and can the model be implemented into your workflow.

This column was published in the July 2015 issue of American Shipper.

Other FreightWaves Products

Big Data, or Big Surface area?

FreightWaves Staff

Small Fleet & Owner-Operator Summit | April 24, 2024

REGISTER NOW!