Video: From Code Abundance to Delivery Confidence | Duration: 1708s | Summary: From Code Abundance to Delivery Confidence | Chapters: Introduction and Poll (0.056933590837319414s), AI Adoption Surprise (206.8619335908373s), Education and Adoption (332.59693359083735s), Scaling Friction Points (546.6619335908373s), AI Quality Challenges (745.0369335908373s), Value Measurement (890.5169335908373s), AI Cost Management (1095.9919335908373s), AI Governance Reality (1252.5769335908371s), Key Lessons Learned (1565.2469335908372s), Closing Remarks (1671.1519335908372s)
Transcript for "From Code Abundance to Delivery Confidence":
Thank you so much, Phil. Yes. Long live software engineering. We loved your perspective on the AI journey, the ways we can use agents in our SDLC. I think this was exactly the view we needed. What's possible is really impressive. And as you said, it truly take to truly take advantage of code abundance, we need to streamline the rest of the pipeline. So Phil covered a number of the challenges that come with AI adoption, and that's what we've been really talking about today with the report as well. I wanna turn this question to you where we have another poll. So everyone please jump in on your next year chat. You'll see a tab for poll, and it's live right now. So the question is, what is your biggest AI code generation challenge? Quality, cost, or governance? So please jump in, start answering that. Love to see these, questions answered live by the folks who are in the session. And while we wait for that, to just giving you a minute to answer that, a few reminders, if you haven't yet downloaded the state of code abundance report, the link is in the chat. Do it now. It is the best context you can have for the conversations that we are having and what's coming. Also, be sure to drop your questions in the q and a panel. I have loved seeing the conversation, the commentary so far, and seeing, you know, all of your experience and insight, and response to what we are hearing. So we we we wanna hear your questions as well. We will be bringing them back, at the after the next session. So next, we have Gerard McMahon from Fidelity, and after that, we are going to be opening it up for live q and a with our panelists. So please drop drop your questions in so we can get everyone's insights and answers here. So let's take a look at the poll results. Oh, wow. Look at that. This is, very nicely spread across all three areas about, you know, a third each with production readiness and quality, cost, governance. And, you know, it is really echoing what we saw in the report where these are big pains and they're affecting people differently. Right? My focus here at Cloudbeads is definitely in the release orchestration, production readiness. So that one I hear for sure, but we know token anxiety is real. We know these problems are what people are facing today, and we're, you know, here to talk about what we can do to start addressing them. So in our next session, Gerard McMahon will be joining us. He is head of platform engineering at Fidelity Investments, based in Galway, Ireland. He leads the team responsible for the ALM tools and platforms that engineering runs on, and this is at one of the world's largest financial institutions. And joining him is Johani Marcoux, VP of corporate marketing here at CloudBees, and they'll be walking us through what happened at Fidelity when code generation velocity started out pacing the ability to ship safely. So this is the real story. Where did the friction show up, what they tried, what didn't work, and how they got control back. So this is an honest version, and the kind of account that every engineering leader over here really needs to hear. So Gerard and Joanny, over to you. Thank you so much, Yvonne. And Gerard, thank you so much for being here. For those joining us, Gerard is head of platform engineering at Fidelity Investments based in Ireland. He leads the teams responsible for ALM tools and platforms that engineering runs on at one of the world's largest financial institutions. So we're gonna be spending, the next twenty minutes on what actually happens inside a large enterprise when AI cogeneration scales faster than everything around it. That's the real story. So, Gerard, let's start with the beginning. So Fidelity, as we talked now, is, as we mentioned, is one of the most complex engineering environments in the world. When AI cogeneration started scaling inside your organization, what was the first thing that surprised you? Yeah. Thanks, Johhine, and welcome everyone. Yeah. It was really interesting. I think Phil said he was involved in since '21 with the early days of, GitHub Copilot. I think I I jumped on board around, January 23, and it's took a quite a few years, I think, for the products to evolve. But, obviously, in the last, I don't know, it seems to be six months, maybe a little bit longer. It it's just kind of exploded. And I think when we rolled out, these tools to everybody in the organization, the first thing that surprised was how many people who were not traditional engineers, how many of them wanted to be on board. We also found a huge amount of experimentation and a huge amount of innovation, You know, a very positive thing initially and obviously deals with challenges as it comes later. But, yeah, I think it was just a wealth of excitement, and I think it's a lot of that. Maybe it was in the personal lives. The tools had kind of evolved to a level of maturity, But I think it was the just the embracement of that everybody had for this new technology. You know, considering it was a threat to all our jobs, I think that was one of the more surprising things that it was it was so well embraced and, and so highly used. Yeah. That's, that that that's interesting. And so at at what point did you realize that the pace of code generation was outrunning something else? And and, and what was that something else? Yeah. I was you know, we were in a fortunate position. I was you know, as part of, the organization I run, we we capture a huge amount of data behind the scenes, whether it's be from, you know, our source code management, our pipelines, or any type of scans we might do across our our code, artifacts, binaries, etcetera. So we had this wealth of information. So we had start to put together as how do we measure different things. And when you know, while we had a lot of activity, you know, we we looked at kind of activity, your frequency of usage, the amount of pull requests getting merged, created, the size of the pull request, etcetera. Just different insights and different metrics to help, you know, see what the story was, see what the you know, how we were, you know, using these tools and what kind of outcomes might it be having in the organization. And what we actually found surprisingly was there was a lot of churn within, people's usage of the tools, and we we weren't necessarily seeing any kind of uplift on outcomes we might expect if we take pull request as a kind of a way to measure something. We how is the how are we seeing the amount of code, the amount of usage reflected in the kind of the outcomes we might expect? And we what we found was usage actually while it ramped up very quickly, so so we were measuring user, how many requests, how many days were you using it, and just basic things like that. We found it started dropping back off again amongst users. So there was a very initial peak of of kind of high usage and then tracked over and we're tracking these over, you know, tens of or, you know, I think at the time in September, might have been October, might have been, like, twelve, fourteen thousand people. We were seeing that drop off. So what what we actually we worked and focused on was education. So what we found was that people getting their hands on these very powerful tools with these with these very powerful models, you know, the initial playing and the initial satisfaction from kind of experimenting and seeing what happened, but they were unable to translate that into their kind of into their work and into the into the, you know, the products and applications that they support. So we actually started in October, and we we we're actually just kind of finishing off now as an education. We have 20,000 people educated. We have had run hackathons, one zero one, two zero one, three zero one workshops. We've run, you know, you know, bring bring your own work to the sessions so you can meet with the experts, office hours, kind of a huge variety of educational packages, and there's many repeats attendances. So once we got through the educational side and we started to see traction on education earlier this year, we actually site started to see, we're now, I think, at 70% of people are using it over 50% of their time. So every, like, at least every second day of the week, they're in using these tools with with a high degree of activity. And now we're actually starting to see positive outcomes from an organizational perspective. So it definitely is the one thing lesson I have learned is education and learning. These tools aren't they're just you don't pick them up. You need time and investment in developing your skills in them, and, you know, you can definitely see, better better usage and better outcomes. That's that's that's a good point. And, and this is obviously a theme that we've heard all throughout this, this summit, education and culture, and how critically important it is to be being able to drive outcomes. We'll dive a little bit more into that. But before we go into that, I'd like to also kinda go back to, the earlier days or maybe when you saw, like, the pace of cogeneration really out outrunning everything. Can you walk us through what what broke, and, what were some of the core challenges? And you talk about, education, but there must have been some, some friction points, in the system. Was it testing, governance, visibility? How bad was it? Did it get before you had like, you could put a finger on what was happening? Yeah. I mean, it's a great question. And I think those points of friction kind of changed, throughout the period. Actually, one of the things, the learning actually, as people got better with tools, it created more friction, which in hindsight, you we should have expected. But the things what happens initially was, you know, you've got a a growth in everything. So you get growth in lines of code. You've got, you know, their kind of impacts in downstream systems. Right? If you're talking from, you know, are are those systems set up to scale to the increase, like, if we're taking hey. We got 10 x, 20 x, 30 x more code being generated. Are our pipelines able to support that ten, twenty, 30 x from are your licenses software licenses able to support that the new growth and new users? Are people actually translating it into merges, into feature branch or productive branches that go to production? And the pull request is probably the biggest place we saw the biggest point of friction because if you get a productivity increase on the left side of the pull request, ten, fifteen, 20%, are you seeing relatively the same productivity increases on the right side of the pull request when we were not? And then I think, people were getting burdened on you know, and a lot of like all our organizations, you know, a team is made up of junior, you know, you know, is you know, maybe grads come out or just coming out of college or, you know, just in first or second year of of working, you have more in a middle layer and then you you're more experienced engineers, and a huge amount of work was falling on the senior and, you know, the principal engineers within the teams to review this code. And it was just too much of it. There was actually just too much things to review. And if people weren't using the tool properly or they were just generating code and accepting it and then opening the pull request, there was a you know, the term slop, you know, they were going through that and then there was just countless of iterations on that pull request with feedback, and it actually started to slow down. So while code was getting created at a quicker pace, everything else in the behind that was actually slowing down. So you were having a negative productivity, not a positive productivity. And I think that was that was one of the first things we saw that we were ever you there needs to be investment in all aspects of the SDLC, not just a focus on these tools and how well they can create code. Mhmm. Oh, so interesting. I mean, obviously, this is like, the following stat is a stat we've heard all throughout, again, the summit, but our report found that eighty one percent of enterprises are seeing production issues increase the AI generated. That resonate with you were experiencing and, and or did you catch that, before? Yeah. I think, I I think with I don't think we've seen maybe necessarily, you know, production instances. I think what we've been more careful around is, the quality. Right? So ensuring that the quality of processes, were there. I think we've seen a higher degree of, you know, in the automation. So your QA automation, whether it's manual testing or functional testing, I've seen definitely, I think we've seen an an increase in failures in that as in there and then getting that get to send back to developers. You would hope if it's AI code or not AI code, that you have enough quality and quality assurance, between the code and the production deployment. But, definitely, it's in it it was in the kind of quality space that we were seeing more and more, incidents. And, also, I I I saw one recently from one of the teams was the code fixed something and it it fixed it perfectly, but it broke something else. So that code went all the way through and it actually worked perfectly in production and solved the the problem that the code was generated for, but it had an unintended impact on somewhere else in the code. And so which really then forces, the conversation around how well are your repositories and your application, how well is how healthy is it and well set up to kind of really work in an AI native way. You know, do you have the automation, the scanning, the the testing, the intelligent testing that as code is getting generated, that's it's testing all boundary, all aspects of where that code is touched. So it you're not just focused on kind of a narrow, you know, verification of what that code touched, but is there some other parts of the system that there was a dependency that might have broke that your testing didn't cover? So I think that that's been, I think, a real point of friction is how do you identify that because, traditionally, people have not necessarily focused on that area. Gotcha. Gotcha. And so kind of bringing it back to the value question, that we talked a little earlier. How do you know you're getting value from these tools, and how is Fidelity approaching that measurement? And we've talked about the CFO CEO dialogue. Is that happening in your organization? And what are those conversations like? Yeah. That is a brilliant question, and it's and it's a real topical one, you know, especially here right now. I think, you know, initially, when maybe the cost of these tools and, again, I I I'm not sure if there's a right answer. But if there's a right answer or a wrong answer, wrong cost, how do you make the tools available? But at some point in time, there has to be kind of how do you know you're getting value, right, as you said. You know, with some of the recent price changes and, you know, and the re some new releases of the newer models, they're just becoming more and more expensive, you know, to for people to use. And how do you know that you're getting effective? Like, you know, we see see one of them is are you in control of all of the input tokens and all of the output tokens and the unless these harnesses, if they're kind of taking more turns than you might expect, then the cost is rising without you as a user of these tools actually getting the benefit and the value. And maybe it it's just waste waste of cycles and wasted cost in that. And, you know, we've seen the price price increases recently. So we are looking at how we measure, value. We really want to while the pull request is is one measure, we wanna measure across the LC SDLC. So we're going to use kind of four metrics, Pull request been one. The next one would be how many artifacts are we creating through our CI bills. Right? So to see if we're seeing an increase or a decrease on that. And, also, maybe, is that related to quality as we just spoke? Are we seeing more CI bills because we're having to reject those CI bills because of quality issues? Then when we promote an artifact for production or or production candidate, we wanna kinda, you know, we'd hope that number would be more stable. And, therefore, we wanna track, you know, relative to previously, are we seeing an increase or a decrease on on the volume of them? And and then production deployments. Actually, how many how many times are we touching and changing production? Because ultimately, that's the the event that's where you measure value. As a Fidelity organization, we wanna we wanna measure is how are we improving the lives of our customers. Right? We wanna touch our customers or versus, you know, generating things that live on on computers that the customer never puts their hands on. So we're really focused on that. So we wanna track it the whole way through. These four measures we think will allow us see if there's churn or friction within the system, but also we can see is that value being carried to true. And then if you can measure that value to production, then I think it the cost maybe is immaterial, because, you know, one type of developer or one kind of use case or one persona might warrant a different set of limits or budgets than another. So but, again, tracking how we're improving our revenue or improve or, you know, improving the operating model so we're reducing costs, we it will help us then kinda define that FinOps model, but it's a very, very topical, conversation right now, I I think, for everybody. Mhmm. And those four metrics are are are interesting. It'd be interesting also to hear from, the other, participants in the room in terms of what are they, looking as indicators of, of value, as they're measuring, their their the the progress and and and and performance. And, on that topic, I mean, we, we've heard thatcom that comment in the first presentation, token cost and token maxing, and the AI infrastructure spend. Are you and your CFO still friends? Are you is this a conversation that you're having, at Fidelity yet, or is this still just living in the engineering environment? No. It's definitely a conversation we're having. We're in actually, literally in the middle of it. I think it's trying to figure out, the right balance. Right? I think, you know, it it's not about, have the haves and the have nots. I I think we we have to make sure that all of our associates are equipped with the right tools and the right technologies in order for us to deliver for our customers. So I think we're we're very much focused on that and, you know, what we make available. But, again, I think some of the things the area we're focused on is, right now, we're looking to do this probably over the summer months into early I'd call it early autumn, fall for others, is, you know, for your particular use case, what intent do you you wanna use these models? So, and maybe there's a a different set of models that might answer one question versus another question. So if I'm doing some real security view or a complex trading algorithm, hey. Maybe I wanna use, you know, the latest and greatest Frontier model, and I'm okay with paying the kind of price associated with that. But if I wanna ask a piece of code for a piece of you know, understand the piece of code, I don't need to use the most expensive, greatest, and latest. Maybe I can use an open source model. So I think we're gonna look at, how we provide an environment, like an AI Gen AI Dev Assist environment that constitutes open source models, maybe local models, different frontier models, and really help route the things right to right you know, whatever the intent of the the user is, route that to the right the most appropriate model. And then they can and think help balance, you know, that FinOps type, approach to these, but ensure that we're providing the two the right tools and the right technologies to all of our associates for the right use cases. Great. Thank you, thank you for that. And bringing us now to the governance reality. And I think we talked a lot about, your the complexity of the environment in which you you need to operate. And Fidelity operates in one of the most heavily regulated industries in the world too. So, how does that change, the AI governance conversation compared to what you've, you'd hear in a tech company, for example? And I do see also a question in the chat specifically around, some of those, those governance, requirements within your environment. So what are what does that look like, in for you, and what have you had to put in place, to make sure that, you were, you had the right, guardrails in place? Yeah. I think, yeah. It's it is a great question. And we we it's it's kinda multifaceted really, like, you know and, you know, not to simplify too much, but code is code. Right? So, again, you have your regulatory requirements as, like, in in depending on whichever regulation. So whether it's AI written code or human written code, that code is still subject to the same regulations. I think it's ensuring that we have, you know, the the strength of the controls that we have in place are able to scale with the volume. You know, we don't we treat you know, we wanna make sure that the there's always a human in the loop. Right? So we have everywhere there has to be human in the loop, under no circumstance. And in some cases, we have we have two humans in the loop, and they're required to review before we go on. And then we're looking at things like to make sure that happens, like reviews in-depth versus just, oh, that had a review. Looking at the depth of the review and the quality review is something we're looking at to to make sure that's, you know, of high degree. And if anyone kind of came back and asked any questions, things like, you know, here's the evidence. And we do have evidence, I should say. So we collect I mentioned earlier, we were collecting all this data. You know, we collect this data, and we can always use this data then as evidence to attest to anything. So we as we capture that, we we look at different dimensions, but depth is a kind of a new one we'll probably look at, to ensure because there's a volume of code, we wanna we wanna ensure that that is being looked at. And then we look at best practices and things like this, I think, to to to help ensure that the right things happen. We'll prop we will look at high you know, product release candidate artifacts or production deployments. We will always gate we will start gating them as well to ensure that they've ensured that they've gone through the right steps and the right, set of controls that we want we are interested in, whether it's code quality, security quality, and then we will verify them before allowing that change into production. So, again, we wanna codify the regulation and codify best practices and the standards, and so we can do this in real time. Because, again, we want the humans to be delivering value, not focused on this, and we want to we want the system, the platforms to help kind of deliver and manage that for us. And then the attestation, hopefully, should be as simple as a browser web page and a report to kick click into something and and bring up that attestation and then show the evidence that, you know, we're we're we're adhering to whichever regulation that might be in play. Mhmm. Yes. The, and and and that reference to human in the loop, being critically important. And we're we're also seeing that that accountability model, being being, just, so much more important, especially in regulated environments. And I'm curious, Jir, if, again, kinda coming back to the environment in which you have to operate and and and the nature of your industry, is regulation a constraint for you or actually just a forcing function, that helps, the governance, practice that you establish? I would say it's a forcing function. I don't think it's a a constraint in any way, shape, or form. I think it just helps. And it it's it's good. Right? I think in a way, it it helps as a forcing function. It forces us to ensure we're doing the right things and we have the right checks and balances in place and everything just just with the volume of things we created. And when things are just being whether they're hackathons or idea ideation or experimentation, they can be on a different, like, you know, level of acceleration with a different level set of controls. So it's also important to understand the different types of applications we have in the place and, you know, we still want high very high quality no matter what. If it's internal or an external facing application, a regulated application or non regulated, you still want that very high bar quality. At the end of the day, we want our customers to love using Fidelity products and services. That's our number one. And, therefore, everything we try and do is all around that. And, you know, the regulation is is is just a part of that. So I actually would say we we try to set our bar higher ourselves, for our customers. So I think that's how we we kinda look at it. Try to. not look at it as a. burden or some yeah. Yeah. That's great. And and I I feel like you and I could have a conversation for, for hours. But I wanna ask you one last question before, moving to, to the Q and A, the larger Q and A. So for a VP of engineering in this audience, who's maybe twelve months behind where Fidelity is today, what's the one thing you'd get right from day one if you were starting this journey again? Ultimately, what do you know now that you wish someone had just told you? I I think one would have been the investment in learning upfront. I think that would have been be my number one. It it took us a long time to catch up, and it took us a long time to scramble. And whether we caught everybody in the net, it's it's it's it's it's hard to know. But I think knowing and I think the the tools changed so fast as well. I think that's the other thing that's maybe surprised and caught us out over over it probably caught everybody out. Models dropping every month now. You know, we have MCP servers one month. We have skills the next month. We have agents the next month. You know, we have gas towns and gas cities and gas universes coming on board. You know, it's just just continuous change. I think I don't think there was a realization at the beginning of that. It was kind of we had a simple training session, one and done, but now it's really about continuous education. So I'd love to know that upfront so you could put a continuous education program up in place upfront, and then I think we could afford we could have maybe spend more time working on capabilities and features and things like that, how to maximize and, you know, have people be as efficient as as, you know, to to get the best value we can out of it. I love that. I mean, education and culture. They say strategy eats culture for breakfast, and, and that couldn't be truer in, in based on what I'm hearing from you. So, thank you Gerard for this conversation and for sharing your honest journey. I'm going to hand it back to Yvonne to bring today together. So thank you, everyone.