Optimal recipe generator (using linear programming)


#1

I too read the thread on generating an optimal recipe using a genetic algorithm and noticed a few people mentioning how a linear programming algorithm is pretty well suited for this sort of problem, so I did some research on how this could be achieved and came up with a web version of this. Given a URL for a recipe on diy.soylent as well as the ID for a nutrient profile (which you can obtain by editing the profile and looking at the URL; mine is 536e74ebd8223302004d4618, for instance), it downloads the data and then chooses ingredients and their amounts to fit the profile. (Here’s the code for anyone interested, and the algorithm it’s using.)

At first I created a recipe by gathering a bunch of ingredients from various recipes as well as ones I saw fit. Here’s the result.

After going through this exercise, however, I realized something very important. You and I will never find full nutrition facts for products we buy on the store or online, because manufacturers aren’t required to collect and publish this information. Consider any ingredient in the USDA database compared with the nutrition facts section on the side of a box. There’s a lot of data that just isn’t there! This bothers me greatly, because it means if I try to create a recipe like everyone else here, I will be in the dark about how many nutrients I am really consuming – it is likely I will be getting more than I really need. For things like carbs, fat, and protein this is not an issue, but for vitamins like A, D, E, K (which are fat-soluble, so they can collect over time) or minerals like iron which can be toxic if taken in high doses, this is concerning.

So then I tried making a new recipe, this time with ingredients I’d found in the USDA database. I also added items for bulk minerals, where I can pretty well safely guess the numbers I see are representative of what’s in the powder. Here’s the result of that (be patient, takes a little while to load). The container sizes and prices are fake for this particular recipe, so ignore those in the output above. The thing I’ve noticed, though, is that you can already see that this approach will be a lot more expensive and less economic in general than using products I can find online. (I found it interesting, for instance, that the algorithm has chosen that much potassium iodide.)

What do you think? Are there any ingredients I can add to make this cheaper or more nutritious? Am I worrying too much about the lack of knowledge we have around these other products?


#2

Is there a way to add cost as a factor to optimize along with nutrition?


#3

I think so. Right now it tries to get all nutrients as close to 100% as possible, but I could probably divide the equation it is trying to maximize by cost and factor it in that way.


#4

I’m confused, so you can’t have it cross reference with the USDA database automatically? just have it fill in with those numbers and ignore the manufacturer?


#5

So it looks like it still needs some tweaking. I put in This and it gave me this with over 5 million calories per day, and STILL had many nutrient deficiencies lol.


#6

At least it’s not much more than $8000 a day.


#7

Right so that’s because you didn’t set a max on what your calories are. Actually, you haven’t set a max on anything, except for vitamins and minerals. It’s currently using a maximization algorithm (it doesn’t take cost into account) so it’ll keep going up and up until it hits some constraint.

Your nutrient profile looks like it’s based on the USDA DRI one, which doesn’t set maxes for a lot of things. So try modifying your profile (or making a new one) and then reload the page.


#8

I tried cross-referencing with the entire USDA database but 1) it took a long time to compute (I have to benchmark and optimize the library I’m using as right now it seems to scale exponentially) and 2) there are over 8000 ingredients in the database. Most of them are ingredients you would probably never put into a recipe (sweets, brand name foods, meat of any kind, baby food, snack food, etc.). There’s not a great way to filter them down, so the only other option is to build a list yourself.

(EDIT: I can certainly play around with it and see if I can whittle the number down simply by excluding food categories or excluding foods with certain words, though.)


#9

It doesn’t work at all.


#10

Okay… what are you trying to test it with?


#11

This is a great idea that I lack the skills to implement. It seems to be dropping ingredients list, though. I’ve tried a few different recipes.


#12

Not sure exactly what the issue is but when I try inputting a list of ingredients (http://diy.soylent.me/recipes/test-optimized-profile), it says “We’re sorry, but something went wrong. - If you are the application owner check the logs for more information.”

However, even though I wasn’t able to get my test case working, this looks to be an interesting approach. Paging @nickp, @2potatoes, and Alrecenk as I believe this is relevant to their interests.


#13

Another failing test case: http://diy.soylent.me/recipes/super-food

Oh, and I meant to page @Alrecenk in the last post but it didn’t let me because of the new user limitation.


#14

It didn’t work for my recipe (generic “something went wrong” error), but I looked at it running with your recipe (I haven’t looked at the code) and there are a few things I want to mention.

I think your objective function is a little screwy. I don’t understand why you’re maximizing and not considering cost. I would recommend minimizing total cost (or if costs aren’t available minimizing total weight to get the densest nutrition possible) subject to nutrition constraints. When you add constraints that no ingredient can be below 0 and minimize you’ll get a sparse recipe (lots of zeros in the final recipe) which will make for a much more practical to produce as well as cheaper result . Additionally, if you’re trying to crunch the whole FDA database with linear programming you’d probably have better luck with an interior point method rather than simplex. Simplex worst case is always exponential, average case is much better, but still, interior point methods are bounded polynomial time and if I recall correctly they do well on sparse problems which this is (or at least could be depending on how you pose it). Personally, I never tried crunching more than 50 or so ingredients at once.

I can’t remember if my article was linked in that big long thread, but here it is anyway for completeness:
http://inductivebias.com/Blog/project-auto-soylent/

In my app I used quadratic penalties rather than the logarithmic barriers seen in interior point methods ,which in my experience tends to work better numerically. It doesn’t guarantee every constraint will be satisfied perfectly, but that’s not necessarily a bad thing if it means you have more control over cost and sparseness.

P.S.
After I did my little Auto-Soylent experiment I wrote a tutorial on the basics of optimization which is what this type of problem is, and is a good place to start for anyone who wants to get into this crazy stuff. The article I’m working on now is on 3D structure from camera motion, which is also awesome, and I’m still accepting contract work hint hint cough I need money cough.


#15

Thanks for the feedback, @Alrecenk. I didn’t include cost in the equation because I don’t believe that to be as useful as maximizing that each constraint is as satisfied as possible. What I found was that if you choose ingredients that are themselves cheap (basically, buy in bulk), then that tends to take care of minimizing cost. I’ve found that my recipe tends to be around $10/day, which is pretty good for me (considering I pay more than that). Of course the algorithm could figure out a recipe that’s cheaper than that, but you’d be sacrificing nutrient completeness and I’d rather have a recipe that gives me what I need rather than one that’s as cheap as possible.

I guess it ultimately depends on you (the user) and how much weight you want to place on cost vs. completeness. It would be nice to figure out a way to specify that via a slider, and then feed that ratio into the equation.

Thanks for the pointer on interior point method. To be honest, this is a new field for me – I had to do research on how simplex worked and that took me a while but it was really satisfying once I figured it out. I actually did run across your blog posts last week, though, so that’s pretty awesome, I will definitely read up on all of this some more.


#16

Sometimes I wish I had double-majored for my B.S. by adding Mathematics. This would be a good application to use linear algebra, but unfortunately I do not know enough about it to be useful.

I still think this can be modeled as a variant1 of the multiple knapsack problem, because that problem does factor in cost as well as having multiple knapsacks.

The general algorithm I propose would be an exhaustive tree-based approach of trying every combination of ingredients that satisfies the criteria below:

  1. There can be at most n ingredients in a recipe, out of m total ingredients, where n and m are arbitrary numbers.
  2. There are exactly k knapsacks, where k is the number of nutrients (micro and macro). This k is a constant which I have not yet counted, regardless, it is good to generalize it from an algorithmic complexity perspective.
  3. Each knapsack (nutrient) has a range of values which defines “full.”
  4. One ingredient can fit into between 1 and k knapsacks simultaneously, and provides a different weight w to each knapsack.
  5. Ingredients fit into knapsacks using an increment i, measured in grams or IU.
  6. There is a maximum cost c for a recipe which may or may not be able to be used as an optimization somewhere.

It is a bit late for me to figure out the Big-Oh notation for this, but it is clearly exponential. Probably not as efficient as linear algebra, but it will also likely produce better results (although the linear algebra approach will probably produce “good enough” results much more quickly).

Likely the algorithm would first figure out all of the combinations of ingredients that can possibly fill all the knapsacks (e.g. a recipe lacking Vitamin A would result in death, so throw out that combination). Then try different combinations of i for each ingredient n. Once the combinations are locked down, the algorithm could calculate the increments for each recipe in parallel, speeding things up.

I will try to whip something up this weekened. Data will be the hard part, actually. I suppose I could download and parse the USDA database.

1 The knapsack problem is actually a class of problems, although it has one common form with which all computer scientists are familiar and which exemplifies the entire class of problems. The general idea of stuffing objects into one or more knapsacks to fulfill arbitrary criteria satisfies the definition of knapsack problem.


#17

I don’t really see a sacrifice. Given a large enough list of reasonable ingredients hitting 100% completeness isn’t all that difficult. I don’t see much benefit in trying to be “more complete” once all the minimum and maximum nutrient constraints are satisfied. I’m not even sure what that means. I guess you could make the constraints tighter if you want, try to hit everything at exactly 100% minimum, but I don’t think you’ll get a health benefit out of that. If you give yourself even a little bit of wiggle room you’re going to get a whole subspace of “complete” solutions, and then you have to pick a single solution out of that somehow. Price seems like the obvious (relatively) easily quantifiable thing.

Oh, and about the slider weighting cost vs completeness, I do actually have that in my code. It’s the " w = 0.001, // Weight cost regularization…" at the top of the file. That’s the sort of thing you can’t do if you pose the problem as a linear programming problem, but you can do if you pose it as a nonlinear convex optimization problem. It’s theoretically a harder problem to solve, but in practice the good algorithms look basically the same.