The Cutting Edge Podcast Episode 27: Hazelnut Breeding Part 1

An interview with Dr. Julie Dawson, Associate Professor in the Department of Horticulture at UW-Madison, and Dr. Scott Brainard, Research Associate at UW-Madison and Tree Crop Breeder at Savanna Institute, about recent progress in breeding American and hybrid hazelnuts.

Transcript

JASON FISCHBACH 0:00
This is a podcast about new crops, you’re gonna love it. Join us on the cutting edge, a podcast in search of new crops for Wisconsin. If you step back and think about what you just described, it’s unbelievable. It’s incredible. But somebody has figured out how to do this. It’s just It’s cool, right? Yeah,

Scott Brainard 0:20
it’s totally wild, that this is something that we can, honestly that we can afford to do.

JASON FISCHBACH 0:37
Welcome back, everyone to the cutting edge podcast. It’s been way too long since our last episode, but we’re back better than ever. I’m Jason Fischbach, the emerging crops outreach specialists with UW Madison division of extension, and one of the hosts of this podcasts. And what better way to reboot the cutting edge podcast and with an episode on hazelnuts. A lot has certainly been happening with hazelnuts, and particularly with hazelnut breeding. So I thought we’d go behind the scenes, and talk to some folks carrying on this exciting work. So here we go. So let’s start with a brief overview of where we’re at here in the upper Midwest with hazelnut breeding. Option one plant varieties of European hazelnuts developed in Europe, by Oregon State or most recently by wreckers University. In general, this is not a viable option, though, as this plant material isn’t sufficiently winter hardy, nor is it sufficiently disease resistant, in our opinion, now, as a hobby, sure, but I don’t think we can build an industry around European cultivars, so option two are endemic American hazelnuts that come from the wild in both Minnesota and Wisconsin. They are cheap, and widely available from private and public sources. They are certainly winter hardy and disease resistant, but they’re as of yet are no proven varieties. And the seedlings, which is what are available are highly variable, and tend to have nuts that are too small. So option two isn’t really an option to build an industry around. So that brings us to option three hybrids, which are offspring from crosses between European and American hazelnuts, private and public breeders have been making such crosses and selling plants for more than 100 years. And in the late 90s, early 2000s. Lots of growers planted these hybrids in the Upper Midwest, we think nearly 200 acres in small plantings all over the place. So we here at UW and University of Minnesota, we got involved in 2007 at the request of these early adopter growers, on average, the plantings of these hybrids weren’t good enough. But individual plants or even the top 10% were really nice plants with potential to support an industry. So we made copies of these top plants, evaluate them at multiple locations. And if since selected the top 10 or so these are what we call the UNH gi a first gen selections, and we’re doing everything we can to get them propagated now to growers. Here’s the thing, though, none of these first gen selections are perfect. Plus, if you add up the work of the private breeders generate this material to growers to grow it and us to evaluate it in the replicated trials. It has taken us nearly 30 years. And who knows how much money to generate these these first gen selections. So this time and expense is why we don’t have more perennial woody crops in our agricultural system, even though we desperately need them. We have to be able to generate improved plant material faster and cheaper going forward. And luckily, we have some new tools to help us create the second generation of hazelnut material for growers in the upper Midwest and helping us with this and leading the charge. This is Dr. Julie Dawson, University of Wisconsin Madison Department of Horticulture. Julie, welcome. Thanks for taking the time this morning.

Julie Dawson 4:02
Yeah, thank you for the invitation. So

JASON FISCHBACH 4:05
maybe just a quick overview of what you’re working on right now with the hazelnut project this summer in particular.

Julie Dawson 4:13
Sure. So we’re working with you, obviously, and with Lois to look at ways of improving the breeding program so that we can select parents that are most likely, as you said, to make the best crosses and then plants the progeny that are most likely to have the combination of traits that we are interested in. And to do that, first we need to study the diversity in American hazelnut to see how much variation there is for traits that are really important, like the kernel size and quality, different types of Bush architecture, resistance to EFB. There there are some key trade bits that needs to be there for growers. And so we want to study how much variation there is in American hazelnut for those traits, and then what we can do with genetics to predict which parents would give progeny with the best combination of those traits. And then after we work on predicting the best parents, then using genetic information to take all of the progeny, the potential progeny, and plant only the ones that are predicted to be the best in terms of the combinations of those traits. Now, anytime you use genetics to make prediction, there’s a lot of room for error. And it’s not a perfect prediction in any sense. But the idea is that we can understand these traits a bit better, and then make decisions that are improvements on essentially planting a random sample of the progeny, which is what you have to do if you don’t have any genetic information. And we have a large planting in Spring Green this year in collaboration with the savannah Institute. And the idea with that is we’ve made a lot of different crosses between the best the best current Hazel, not clones, that Lois and Jason have been working on for many years. And we want to study the inheritance of the traits that are really important in order to build these statistical models linking the genetics of each bush to the phenotype that you actually see, so that we can make predictions going forward. And hopefully, it won’t, it won’t necessarily shorten the time so much. But it should increase the probability that we get that really good bush that has the best combination of everything.

JASON FISCHBACH 6:45
So let’s take a little deeper dive. What do you mean by genetic information? Like what specifically are you able to measure or quantify or understand when it comes to genetics?

Julie Dawson 6:57
Yeah, so the technology for genotyping has advanced to the point where it’s relatively inexpensive to get a lot of different, what we call markers along the genome of each Bush. And so we might have 150,000 sites or so where Bush one might differ from all the other bushes in the population. Now, it won’t differ at all of those sites. But each brush has a different combination. And so by using that information, we can essentially calculate how related each brushes to every other brush in the population. And because we know that relatives usually resemble each other, more than unrelated individuals, and closer relatives resemble each other more than distant relatives, we can use that information on exactly how related each bushes to every other bush to make a prediction about how it would do in five years, say, you know, we genotype it, when it’s a seedling, we make a prediction about what its kernel characteristics are going to be. And then we validate that and improve the model as we go. So the genotyping itself doesn’t really give us that information about the genes per se, that are involved in that those traits, but it allows us to use statistics to predict what that Bush will be like in four years, or what the progeny of that bush would be like. Now, we’re also working on understanding the genetics of the traits. And so we can use those markers. And again, a statistical model to test at each site along the genome is that site more or less associated with a certain phenotype than another site. And so you test each marker to see whether it has a higher likelihood of producing a certain phenotype. And then eventually, you can get down to having a marker that’s really in the gene that controls kernel width, say, and then you can use that to select. The trick is that most of these traits, including kernel width, are not controlled by one gene. They’re controlled by many genes. And so then you have to have many, many markers in order to actually tag each of those genes, which is a very long term prospect. And in general, and breeding, unless you have very simple traits, so certain disease resistances are simple, not EFB. But certain other types of disease are very simple traits. But unless you have those very simple traits, usually you’re going to be better off using the genotypes as a way of calculating these relationships with all of the other bushes in your program and making these predictions rather than trying to track individual genes. Now we might get lucky and we might I’d have a few traits that are simple, and only controlled by a few genes. And in that case, we can, we can localize those genes by genetic mapping, and then develop a marker that would track that particular gene and that particular trait. And that makes it relatively easy to screen things early on, where we would have, you know, hundreds or 1000s of seedling plants, and we could screen them all for that marker and decide which ones are going to be resistant or which ones are going to have the best kernel characteristics. That that would be, you know, essentially us getting lucky because most of these traits are probably controlled by many genes. And that’s what we’re studying now. And in that case, what we would do is genotype all the progeny hundreds of 1000s of them, and then predict which ones are going to be the best. And, you know, it’s not a perfect prediction. So we have a plant, the top 30% or so knowing that our predictions are not 100% accurate, but knowing that that 30% are more likely to be good than if we take took a random 30% from the progeny population.

JASON FISCHBACH 11:11
So let’s use an example of how this could work. One of the things that growers think about or obsess about is plant height, especially those that are harvesting mechanically, if the plant gets too tall, like European hazelnuts tend to do, they don’t fit through the harvester. So if I’m hearing you’re right, in theory, we could make crosses between parent a, parent B, and maybe parent B is relatively short, and parent a tends to get too tall, we could then make a cross, grow out all those seed, take a little piece of leaf from each of those offspring, each of those seedlings, do the genotyping, basically sequencing part of the DNA. And then from that information, we could predict whether that offs that seedling which is still alive, say in the greenhouse, we haven’t thrown it out yet, is going to be a tall or short plant. Is that roughly how it works? Okay,

Julie Dawson 12:08
yeah, and probably it would be a continuum, right, you would have shorter plants and taller plants, so you would predict about how tall it was going to be and decide, okay, well plant the 30 percentage predicted to be shortest. And that would save you the resources of growing all of those seedlings out for four years to see how tall they got. Because usually you don’t have that. And so you’re essentially saying, well, we got, you know, 200 or so, seeds from this cross, but we can only afford to plant 50 of them. So we’re going to plant 50. And until genotyping was economically a possibility, it would essentially be a random 50. And you may or may not have the best in terms of plant height in that sample. But with a prediction, you can be a little north sure that you’ll have the best, the shortest plants and your sample of 50, if you make a prediction first about what they’re likely to be. Now, prediction accuracy is, like I said, not anywhere near 100%. And so you still have to plant more than you think you need. But it’s considerably less than if you had to plan say all of them to be guaranteed to get the shortest ones,

JASON FISCHBACH 13:27
right. And until you actually know that genetic control have markers for specific alleles, you’d never be able to get to 100% at that point, right. So

Julie Dawson 13:36
even when you have a marker for a specific allele, it’s not 100%. But it’s a lot closer. And so if you have, say height is controlled by one or two genes, we could get in there eventually and say, Okay, now we know which genes control height. And so we’re going to just look for a variation within that sequence of DNA. And just do a, you know, a DNA marker in that gene. And we’ll run that and then we’ll use that only to select rather than genotyping with 150,000. Markers, we’re likely to not need that many markers on the long term, and that will lower the cost. So the goal with what we’re doing now is to understand the genetics of these traits, whether they’re controlled by many genes, or a few genes, and then look at how many markers and what genotyping platform, we actually need to be able to make good predictions so that we can get the cost down to something that’s reasonable to use in a breeding program on a routine basis, rather than something that is essentially a research project. That is going to cost a lot more because we’re just laying the groundwork for understanding the genetics.

JASON FISCHBACH 14:51
So the work that you’ve been able to do to date, are you able to make these predictions for any traits yet, or how far away to being able to do that?

Julie Dawson 14:59
Are we Yeah, We’re pretty close, I think to be able to make some predictions. So we have genotypes, a population of American hazelnuts that was collected by the DNR and bloat grown in one site that’s quite diverse. And so we’ll be able to kind of understand the genetic control of some of these traits in American hazelnut. And that by that I mean, is it controlled by a few genes is it controlled by a lot of genes, that same population, we’ve taken data on Colonel quality characteristics and our height, and on some of the bush size, things that are important. We also have genotype some crosses between European and American hazelnut. And so we’ll be able to look at whether things are different in that type of population than in pure American American hazelnut. And we just got the genotypes back actually yesterday. So we haven’t done the predictions yet. But the goal is to build models, and to look at how accurate we can get within American hazelnut. And then within crosses between American and European hazelnut and use that then to predict which of those bushes would make the best parents. And next spring, we will make those crosses to then plant things out. And this is going to take several years right to validate whether those predictions are actually good. But in the meantime, we use what’s called cross validation, to see whether our predictions are good. So we can be pretty sure that we’re going to have a reasonable prediction before we wait for year. So what we do is we we’ve hired like, a quarter of the population, and we use the rest of it to predict that quarter. And then we say, Okay, well, Are they accurate or not? So well, we’ll have that information before the spring. And then we’ll use that information to select the best bushes to make crosses, and then plant those progeny out. We also made crosses, like I said, between the best selections of American and European crosses, and American selections, and those are planted out now. And so in four years, we will be able to measure the colonel quality traits on that. And because that population, we know what the parents are, we’ll be able to get a lot closer to identifying genes within that population, that are controlling certain traits. If those traits are under simple genetic control, that means they’re only controlled by one or a few genes. If they’re more complex, we will still be able to build a better model, knowing what the parentage is, and those brushes will be a really good breeding population as well, because they’re essentially Best Buy best crosses. And so we’re likely to get some really good germ plasm out of those crosses. In addition to being able to really advance how we are able to make these predictions and accelerate the breeding process. Like I said, it doesn’t accelerate the trees or the bushes time to producing not since getting evaluated, we still always have to evaluate these in the field and evaluate them in multiple sites before we want to make a recommendation to growers. But it will increase the probability that we’re going to get that really good bush that combines all of the traits that we want.

JASON FISCHBACH 18:26
So truly last question here. All these great advances in in plant breeding in these genomics tools, things like GBS, SSR, Q TL mapping, no one’s ever heard of them. They’re certainly not in the mainstream. But one thing has kind of made to the media, and that’s CRISPR. Are you doing CRISPR? Or you want to introduce CRISPR? Is it possible to use CRISPR? For hazelnuts? What is CRISPR?

Julie Dawson 18:53
Yeah, so CRISPR is essentially targeted gene editing, where you would take a sequence of DNA that you know, controls a trait, and you know, what modification is needed to change that trait. So we’re talking about traits that are under very simple genetic control that are well characterized, and that we know what all of the different alleles which are different versions of the gene do and how to change one of those alleles to another one, in order to make the plants say resistance to a disease or change the color of the fruit or something, obviously, not in Hazel, not so much. But I think that and Hazel, not most of the traits that we’re looking at, including disease resistance are what we call quantitative, which means that they’re controlled by potentially hundreds of genes. And those genes interact with the environment in a way that’s sometimes difficult to predict. I mean, we try to predict it using statistical models and that’s a big part of plant breeding is looking at interactions between the plants. genetics and the environment. But if you’re looking at hundreds of genes and their interactions with the environment, it’s very difficult to make a single change that has a big difference. That makes a big difference in the phenotype that you’re eventually looking at. And so I think that, in terms of practicality, we’re not at the point in hazelnuts, where we could think about using CRISPR to change a traits in a meaningful way for growers. I also think that many of the hazelnut growers may be looking at lower input organic practices. And in organic certification right now, you’re not allowed to use CRISPR. So it would likely be better to avoid it in order to make sure that anything we released would be acceptable to certified organic farmers. I think that eventually, we may understand the genetics well enough that CRISPR might have a use in some circumstances, but it’s hard to from my vantage point, right now, it’s hard to see where that would produce something better than what we can do with classical plant breeding, which is essentially what we’re doing even though we’re using genetic markers. We’re using them to understand the genetics. We’re not using them to change specific genes in a lab.

JASON FISCHBACH 21:27
All right, Julie, thank you. Yeah, thank you. Great. Next up, we’re going to talk to the one and only Scott Brainerd, who’s been on the frontlines in the trenches, use whatever metaphor you want to do this work and dive into the genetics of these hazelnut plants. Scott Brainerd joins us Scott to give us an update on what’s happening in his world. Scott, welcome. And can you introduce yourself?

Scott Brainard 21:53
Yeah, absolutely. My name is Scott Brainard, and I’m a postdoctoral researcher in Julie Dawson’s program at University of Wisconsin Madison. And my work is focused on as, as Julie discussed, helping to develop these ways of using genomic data to improve efficiency of selection in American hazelnut but also these American by European hybrid populations.

JASON FISCHBACH 22:23
Okay, so tell us exactly what plants you’re working with. Where are they? What are they? How many are there? Yeah,

Scott Brainard 22:30
so we are studying? Or, Yeah, we were mostly focused on three populations. And this is somewhat of historical necessity, or these are the populations that are mature and bearing and so they’re the ones we have to work with. We’re also planting a whole slew of new seedlings that will be kind of designed exactly for these experiments, but of course, they’re just one year old right now. So the seedlings and and hazelnuts that we’re studying right now, were not explicitly designed for these experiments, but they, they work pretty well nonetheless. So first off is planting done in outside of Stoughton, Wisconsin, which is just 30 minutes south of Madison. Those are what we call F ones, mostly are significantly composed of a controlled cross that Mark Sheppard made. And then a farmer bought several 100 together with you Jason put in that planting. And so that’s a very interesting population. It also includes some just coreless Americana check varieties. And yeah, so that’s about 300 or so plants. And then the next one that we’re looking at is a planting of about 600 plants in Barneveld, Wisconsin. So that’s like 45 minutes west of Madison, still in getting into the Driftless but still southern Wisconsin, and those were planted by a landowner who just got plants from the Wisconsin DNR. And we’ve been treating that as most likely wild ish, coreless Americana. So not the product of any sort of intentional breeding with other coreless species. You know, certainly because there has been a lot of that effort in Wisconsin, there may be some of you know, sort of contribution but more of a sort of representation of like wild Americana again, not deliberately gone out and saw First from the wild just seed bought from the DNR, but that’s approximating a sampling from the wild. And then the last population, it’s actually three, again, full SIP populations that are in Minnesota, in Rosemount, and those were crosses that lowest Brian made a number of years ago using pollen from Oregon State University. So those are true. Like 5050, hybrids between upper midwest adapted material crossed with just coreless out Alana. So those are three somewhat distinct populations. And just in terms of how they were definitely designed and grown and their location, and now we have Phenotype Genotype them, and are starting to look more closely at what we can learn about the genetic control of key traits and those populations, depending on, you know, some of these characteristics that I just mentioned.

JASON FISCHBACH 26:02
So phenotyping, I know I do a lot of that go out there in the field and measure stuff to understand what that plant looks like how much it produces for kernel, what the kernel size is, blah, blah, blah, genotyping, what do you mean, do you mean, like, you’re actually unraveling all the DNA and reading every nucleotide? Are you just reading small portions of it? or what have you been doing when you when you talk about genotyping?

Scott Brainard 26:27
Yeah, so genotyping can mean a lot of different things. It’s really just a term that it’s sort of, you know, if you think about it as a correlative phenotyping, where you are measuring the performance of some sort of trait. genotyping is measuring something or detecting something about genotypic variation. And depending on the technology used, you know, it can mean actually very, very different things. The method that we’re using is one that’s become really common in plant breeding, because it’s cheap, and gets us it gets us everything we need to do quantitative genetics. So the method is basically broken down into three steps. We first cut, we extract the DNA and cut it up with an enzyme that’s slices it into small little pieces.

JASON FISCHBACH 27:23
Way to extract the DNA. You mean, you go out in the field, collect a leaf or something? And then Yep,

Scott Brainard 27:28
yeah, exactly. So we go out in springtime when the leaves are just emerging, and they’re young. And that means that, well, there’s a couple things that are good about that. One is that there’s a lot of cells in a small area. So the sort of concentration of DNA per unit mass of leaf tissue is really high. And then also, because that tissue is young, it hasn’t built up a high concentration of secondary metabolites and lignin things that would get in the way of trying to extract and purify the DNA, that genomic DNA. So we take little leaf hole punches, and then we freeze dry them, and then we crush them up. And then we use different extraction reagents that basically break down the cell walls and specifically draw out the genomic DNA that was in the nucleus.

JASON FISCHBACH 28:22
So how do you know you’re not getting because there’s all kinds of stuff living on or in that leaf, right? bacteria, fungi, maybe you got an insect in the sample or, or something? How do you know the DNA you’ve extracted is actually the hazelnut DNA,

Scott Brainard 28:35
what we’re getting is overwhelmingly hazelnut DNA, even if there’s a few other things sort of in that mix. That there’s a part of the process where we amplify the the limited amount of DNA that we can get out of the leaves. And that’s going to preferentially amplify things that are of highest concentration in the starting material. And so if you have a little bit of contamination in one or two samples, it’s just kind of statistically unlikely that that will be represented at a high enough level in the final prep, where it would be detected on the sequencer. You’ve done

JASON FISCHBACH 29:12
this genotyping, I hear you just got a slug of data back. So what did what is that? Like? What data did you get back just long sequences of A’s and G’s and T’s and you have to make sense of that or what happens now.

Scott Brainard 29:26
Right? So step one is, is I guess, maybe generally like that kind of sampling in the field, getting into the lab, getting the DNA out and cleaned up. Step two is putting that on to a machine a sequencing machine. We use one that’s made by the company called Illumina. And what happens there is these little short segments of DNA that were cut up by the enzyme that We subjected the DNA to, is attached to a what’s called a flow cell. It’s basically a small piece of glass and then Then all and then you know, these segments are about 150 base pairs long after all of the processing that’s done to them. And we get a readout using a bi, it’s actually kind of interesting chemistry that occurs, we wash over this flow cell in inst, insteps nucleotides, that when they bind to the DNA, DNA is usually double stranded, what we put onto the flow cell is single stranded. So we can put nucleotides on one by one. And if they find a complementary base, they’ll bind. And then unlike normal bases, these will let off a little fluorescent light pulse when they actually anneal to any specific, short little strand of DNA and a camera takes a photo of that. So that’s how the chemistry works. That’s how the sequencing works. It’s taking basically 150 photos of a flow cell with maybe 6 billion little slowly growing strands of DNA. And then there’s a whole bunch of software on that sequencer that decomposes those light signals into exactly what you’re just saying. Little long, little strings of 150, a T, C’s and G’s. And we get, you know, billions of these off of the sequencer, because we’re, we’re sequencing, you know, 1500 samples for sequencing, maybe 10 million. We call it reads, so short little segments of DNA per sample. So a lot of data comes off this machine. If you

JASON FISCHBACH 31:33
step back and think about what you just described, it’s unbelievable. It’s incredible that somebody has figured out how to do this. It’s just, it’s cool. Right? Yeah.

Scott Brainard 31:42
I mean, it’s, it’s totally wild, that this is something that we can afford, honestly, that we can afford to do, just with our little, you know, hazelnut group, which is, you know, not exceptionally well funded, or massive research enterprise. We’re still kind of an upstart here in the like, ag world. And, yeah, I mean, it costs about 30 bucks per sample to do this. So not nothing, for sure when we’re talking about all but that’s everything from extracting the DNA to sequencing it to the final step, which I’ll talk about, which is data processing. And, yeah, I mean, this technology didn’t exist 10 years ago, it was sort of in its nascent stages, you know, maybe 10 years ago, but in terms of this machine, as I’ve described it, and its ability to do really, this amount of throughput is very, very new. And, and yeah, really exciting because it gives groups like ours that don’t have the resources of human geneticists or corn geneticists or mouse geneticists, the ability to do all these really neat analysis. So so

JASON FISCHBACH 32:54
you’ve got a plant, you’ve chopped, you’ve extracted the DNA, you’ve chopped it up into these, these bits, that then you can use the sequencer to sequence all those little bits. And so any one plant you’re going to know the sequence and all these bits of DNA. And now you’ve got what, how many plants at Barneveld? 400.

Scott Brainard 33:10
Right. So there’s about 600 plants at Barneveld 300 plants at roughly speaking at Stoughton, and then up in Minnesota, another 300 or so. So altogether a little over 1000. And, and yeah, so we sequence these little bits, and those all together represent just a small fraction of the hazelnut genome. So back to your original question, we’re not sequencing every base pair in the genome, because we don’t need to for the purposes for our experiments. And that would be really expensive. So what we do instead is this sort of reduced representation sequencing. And just

JASON FISCHBACH 33:46
so people are clear, you’re nowhere near to the point of sequencing actually, actual genes that control traits of interest. This is just like, a broad level fingerprint. Yeah, ticular genome,

Scott Brainard 33:57
exactly. This is this method is sometimes called genotyping by sequencing. And that’s when you’re using an enzyme to cut up the DNA, but a very related approach is called shotgun sequencing. And that kind of gets up the, I think the the idea very with a nice visual image, like you’re just blasting the genome into these little bits and sequencing. You know, just the ends of the fragments that are produced essentially is what is what you end up doing. So once you have all these sequences, what we have actually for two different hazelnut plants, very kind of historically important accessions ones called rush ones called Winkler, they were important in the early stages of the sort of mid or early to mid 20th century initial efforts at Crossing Americana without Alana. One was important in the Northeast one was important Winkler that was rush. Winkler was important in the Midwest programs. We actually have sequenced and assembled what’s called a reference genomes. So that’s where you do sequence every single base pair, and get them all in the right order. And that’s, that’s actually quite expensive, and a whole different kind of sequencing that I won’t get into. But that’s been sort of a sidebar project for these two plants. And what we can then do now that we have these, we have all of the chromosomes and hazelnut assembled and every base pair, not every but a lot of them assembled, we can then align these little what are called short reads. So these 150 base pairs sort of snippets to the reference genome using some fancy some, some very big computers and some some fancy software, we can basically figure out where each 150 characters long character string should fit into this 350 million long character string, that is the entire Hazel machine. And that

JASON FISCHBACH 35:57
might tell you like which arm of which chromosome, kind of say it’ll give you the

Scott Brainard 36:00
exact well, not only that, it’ll tell you exactly where on the on any given chromosome this. So you don’t know going into it like you sequence 150 base pairs, but which 150 base pairs did you end up sequencing? Well, this alignment process allows you to take all of these reads from all of these individuals, and stack them up against each other. So they’re all sort of in phase. And that so that’s that part is called alignment. And then the next part of this, this is sort of step three of genotyping, where you’re actually trying to look for, and this is what’s kind of critical to this analysis that we’re doing is you look for variants across individuals. So step one is to align all of these short reads. And then you look at any given base pair position to see if there are some individuals that say, at base pair position 1,000,602, on chromosome three, have an A, and others have a C. And that kind of a variant is called the single nucleotide polymorphism, often abbreviated snip, and those are the and those are also a class of what’s called a marker, so a molecular marker or a genetic marker. And those are the sort of bread and butter of the analyses that we’re doing, because we go from the whole genome, down to the little fraction of it, that we ended up sequencing down to maybe 100,000, or 150,000, snips scattered evenly, but you know, probably 40 to 50,000 base pairs in between them across the whole genome. So it’s another kind of reduced representation. Step, where we’re taking, we’re going from the whole genome down to this very limited set of markers that are informative from the perspective of determining how two plants differ from each other. We don’t really care about all the bases that all hazelnuts share in common. We want to look for the sites that are variable,

JASON FISCHBACH 38:06
right. So now you’ve got a plant with a, you call it like a snip profile, like it has, yeah, if you will,

Scott Brainard 38:15
yeah, all those are, are great words, for it a genetic fingerprint, we cannot you can also just call it a gene, like, it’s a little bit confusing, like a genotype can be a genotype at a specific position. But if you have all 150,000 of them, you could also call that the genotype of that plant. And yeah, that’s the end up using.

JASON FISCHBACH 38:36
So then the hope is that a particular snip profile is correlated with a particular trait in the plant of phenotype of the plant. So maybe all the plants in Barneveld, that have nuts larger than, say, point five grams all tend to have this snip profile and anything smaller has that profile. Yeah, exactly. So that’s exactly that’s

Scott Brainard 39:02
exactly. That’s exactly it. So once you have the genotype, you want to in some way, and you just got, you just sort of suggested one way of correlating that variation in genotype with variation in phenotype, because that’s the ultimate goal is that we want to be able to use the genotype to do selection instead of the phenotype because the phenotype, it’s really hard to get right you have to plant the plant in the field, you have to keep it alive for years and years. And you have to go and harvest it and measure it and do all this work. The genotype principle you can at day one, you know, when it sends out its first leaf in the greenhouse, take a little snippet a few weeks, take a little leaf hole punch a few weeks later, have the genotype and if you knew what these snips meant, in terms of their impact on phenotype, well, that could really speed things up a lot and be way cheaper. So there’s a few ways of doing that. One is to not use the whole Not, so you use the whole profile. But one method for trying to find these correlations is what’s called Association analysis. And you literally just look for variant variation in the in the genotype that happen that were all individuals that say have a specific allele at a specific snips. So an A, or A T, or a C or a G, also have, say, really large kernels. And then you can say, Okay, this isn’t necessarily like the gene for kernel size, but this marker is what we call linked probably, to a gene. So it’s in the vicinity of a gene, or maybe it is actually in a gene itself. We don’t know just from this analysis, but we do know that we could look at the specific position in the genome and say, you know, on average, plants that have this will have kernels that are point five grams heavier, something like that. Another thing we can do, which is sort of the method that you were mentioning, is use the whole snip profile, whole genotype, and we don’t, and what we do there is we use this entire genotype to look at how related different plants are. And we use essentially the relatedness between plants that we have phenotypes for, and plants that we don’t have phenotypes for to predict the performance of the plants. We don’t have phenotypes who are with the sort of basic idea that plants that are closely related will have similar phenotypes. So in the latter case, that’s often called genomic prediction where we’re not actually like finding specific causal alleles, we’re just predicting performance on the basis of all the genotype information, we have the other approach Association analysis, we associate specific snips with phenotypic variation, but the overarching goal is kind of the same. We’re trying to be able to use we’re trying to be able to do selection on the basis of these genotypes because they’re cheaper. They’re faster. And just way easier to deal with.

JASON FISCHBACH 42:13
Yeah, so the couple other applications, right, as you can use this, let’s say we have a clone. And we made a bunch of copies. Oh, no, we made a mistake, we lost a tag or a label. Right. So now somebody went and took that plant and made 10,000 copies of it. Right now we better know, that plant. Right? So can you use this to know if it’s safe? A pin seven, six? Got it? Yep. What about, you know, one thing we’ve talked about is making crossing blocks. So we’ve got parent a parent B, tons of both of them. In an isolated cornfield, we let them enter cross. And we suspect that all of the offspring will be, on average, good enough to be commercial. And so this could be a method to create, or, you know, plant material for the industry. But we also know, pollen blows around miles and miles. So can you use this to, to know who mom and dad are of any given plant? If you know, if you have the snip profile of mom, dad and the kid?

Scott Brainard 43:12
Sure, can, yeah, we and we’ve actually done both of those things. We’ve submitted multiple tissue samples from individual plants, as well as multiple copies of those plants, and been able to use the multiple plant sequencing as sort of the control and see that Yeah, indeed, a true clone, using this genotyping method will be nearly identical. You can, you know, on two distinct runs of this entire genotyping pipeline, we’ve also been able to, if we have the sequences of both parents in a given cross sequence, all of the progeny and figure, you know, even when you’re doing a controlled cross by hand, there’s gonna be mistakes, you know, mistakes. Plants want to cross with these outcrossing plants, they want to cross with each other. So even when you’re trying to do a whole lot of pollen control, there always be some contamination, and we can pick that out real easily with with these markers. So yeah, both of those sort of quality control things are, are possible. And it’s kind of funny as this as this, as these methods have gotten cheaper and more widespread, really established breeding programs with, like Tayto breeding programs or corn breeding programs with long pedigrees records going back, you know, maybe over 100 years, people have been finding errors in them. And I think having a lot of fun, you know, just kind of correcting the historical record and finding some really interesting things that certain clones that are just extremely important to the industries maybe for these acne, real staple crops are not what we thought they were So, though Yeah, this is this can be really helpful in that regard.

JASON FISCHBACH 45:05
So just curious, you’ve got this huge pile of data you got? Is this like you analyze this on your laptop? Or do you have to schedule time on a supercomputer? And how long? Is it going to take you to answer all the questions you want to answer?

Scott Brainard 45:18
It depends. So one of the reasons for kind of cutting down the data set in these different reduced representation ways is to make it faster and easier to deal with. Because, yeah, I mean, the data set that we get, we get off the sequencer is on the order of terabytes. And then, for that, so for those first steps, where we are aligning reeds and calling snips that’s done on maybe what’s maybe not a supercomputer, but we, we, we we farm that out, too. So actually, I should mention, the sequencing is done by the UW Biotechnology Center that owns one of these fancy Illumina machines, and knows how to use it. They’re really expensive. And so it’s become more and more common now for what are called core facilities to buy the sequencing platforms, which are super high throughput. And then pool experiments from multiple research groups on any given run of that sequencer to, to drive down costs. And then, so then the amount of the initial stages of the analysis is done by the bioinformatics Resource Center, which has a really big computer. And they’re able to, you know, with lots of processing cores and lots of RAM, and they’re able to do those initial stages, once I get the data in the form of the snips, I can Yeah, I can do that on my laptop. I wish I had a slightly faster laptop, because it can sometimes take you know, 20 or 30 minutes for an analysis to complete. But now that we’ve got this data, it’s it’s probably something where over the next few months, we’ll start to be able to do both of those, we will do both those Association analyses and genomic prediction analyses and, you know, have have results over the course of the summer here. So,

JASON FISCHBACH 47:06
so you’ve been waiting on this data and working on for a long time. Yeah. And you know, what, that Stoughton planting? I’ve been waiting since 2011, to know exactly what those plants are. Yeah, that’s what I’m most interested in. But for you, what question are you wanting to answer first? What do you been waiting to? To answer first? What’s the most? Yeah.

Scott Brainard 47:27
So yeah, it might be good to mention the ad has been a long wait. And, and that’s so maybe the way that I represented it sounds like, Oh, this is really easy. You know, once you know what you’re doing. And actually, when you’re doing this on a new species, there’s all sorts of things that you have to optimize the the choice of the enzyme to cut up the DNA, how long to run the sequencer, you know, there’s a whole bunch of things that we had to we had to optimize, we didn’t do on the right way the first time. And so now that we have a protocol down that actually works for Corylus americana, I think, I think the method I just described is actually pretty straightforward. And we’ll be able to repeat it pretty easily. But yeah, it has been a long time coming. So I guess for me personally, there’s a couple things that I’m really excited about. One is you can also use these snips to look at not just like is parent a and parent be truly the parents of offspring. See, you can also look at just like population, just kind of population genetics questions, the demographics. So we can we can see for instance, how wild are these Corylus americana plants from the DNR, you know, how much of Alana is in them? And so they’re and yeah, and how different are they from the plants that Stoughton versus the plants at Rosemount say. So we’re gonna look at kind of like a big picture. When you have this many plants, you can really start to see interesting clustering of subpopulations within these different interspecific hybrids, which I think really cool to see. One of the things that gets at is how easy is it going to be to breed for these traits, how much of the variation in the trade is transmissible to the next generation, having a high heritability means that it’s going to, you’re going to be able to make genetic gain. So let’s say talking about kernel, just to make it specific, kernel percentage, if kernel percentage has a really high heritability, then when we make crosses and select for kernel percentage, we’ll be able to improve kernel percentage by a lot with every generation that we make selection. And these analyses that we’re doing will not only give us like specific positions of specific snips that are associated with variation in the trade. They’ll also tell us how easy it’s going to be to continue to make improvement for these traits. And that’s what I’m really excited about because we don’t know right now. We know that there’s a lot of variation for these traits. But we have no idea how hard it’s going to be, you know, to improve certain aspects of these hybrids that we know, need work. We know that Corylus americana needs like a higher nut quality to be competitive, or we would like it to, maybe it’s not an absolute necessity. So, so yeah, that’s what I’m really excited about. And I’m really optimistic too, because, you know, this is pretty wild material from, you know, if you stack it up against corn or something, you know, that’s been undergoing intensive selection for so long, I think that there will be a lot of, quote unquote, low hanging fruit where there’ll be traits where we can make really rapid gain, or at least expect to on the basis of these, these results.

JASON FISCHBACH 50:53
Well, in the American hazelnut population particulars never really been worked for other worked with other than somebody, you know, private breeders over the years making some collections from wild populations growing them out. And maybe they found something interesting. It’s kind of where Russia came from, or even Winkley. And we have a little bit of data from like them, check about the diversity or genetic variability across the landscape, you know, from North Dakota to Michigan. And it seems to be at least, the analysis, he did a highly diverse population, but we really don’t know how diverse it is for particular traits. But the genetic control those traits and also the, just the phenotype and this Barneveld dataset is really, you know, pretty amazing, because we don’t have anything like it yet. Nothing’s been used before. So this is cool. Ya

Scott Brainard 51:43
know, it’s definitely the first day it’s been, it’s been a, it’s been a lot of work, then, yeah, maybe I should just mention, like, on the phenotyping side, in particular, I mean, you’ve helped you and your crew is helped a tremendous amount with this, Jason, but like a lot of people in Julie’s program, and Lois Braun and Mark Hammond up in Minnesota who have done so much work in recording the per very precise genotypes about all of these bushes, because that’s, that’s the real laborious part is that we hope to eventually be able to just use the genotypes. But to do the analysis. First, we have to have both. And that’s I think one of the reasons nobody’s done this before is it’s, it’s a ton of work and you end up you know, spending a lot of time getting phenotypes on bushes that you kind of know aren’t the ones you want. But to do the experiment, you still have to study them. So Right. So yeah, it’s it’s been a lot of work. But I think what’s come out of it is is hopefully going to be really cool.

JASON FISCHBACH 52:45
So what do you think September, we can have you back on? And you can tell us what you’ve learned?

Scott Brainard 52:49
Yeah, definitely. Well, there’s a big I have a deadline, September 5, there’s the 10th, International Hazelnut Congress happening and in Oregon in a very hopeful to have something to present there. So there’ll be some preliminary results, I’d say by September, for sure. And then I mean, my kind of like personal goal is to have have some, some really final results ready over the winter. So that come next spring, hopefully already, these experiments can start to inform the decisions that are made with lowest, you know, making new crosses and Minnesota or anybody who’s got up sort of family of seedlings, trying to decide, hey, which ones of these am I going to put out in the field? Right, I only have so much eight. And this is where the resource allocation question comes in, you can make a lot more seedlings than you have space to grow. And these methods aren’t a replacement for traditional breeding, but they can help you make those those sort of resource allocation decisions in a way that hopefully maximizes the chances that you put really great plants into the field.

JASON FISCHBACH 54:01
I mean, for 30 bucks, if I had even a clue of what that plant might do or be for 30 bucks at the seedling stage, I would gladly pay that.

Scott Brainard 54:12
You’d pay it. Yeah, that’s I mean, that’s the idea is that it’s you know, it’s not nothing, but it’s a lot cheaper than growing a plant to maturity. You know, it’s kind of a drop in the bucket. At that point when you think start thinking about 1012 years and those keeping these plants alive. So yeah. Hopefully this fall and winter, we will be able to start doing something useful for people. Good.

JASON FISCHBACH 54:39
Well, Scott, thank you. This is Yeah, great. lot going on exciting times. And I’m glad to check in with you today.

Brought to you by the University of Wisconsin Madison division of extension