Conversation
Edited 26 days ago

I don’t like that people use security as an angle when criticizing the use of AI in KeePassXC. If a project accepts public contributions, this means there will be malicious actors trying to smuggle in code which weakens security. The project must therefore have a solid review process in place to ensure this doesn’t happen.

If you see AI as this huge security threat, then you don’t trust this review process. But then you shouldn’t have trusted the software at any time before to begin with.

8
2
0

@volpeon it’s treated as if AI is somehow able to insert magical code vulnerabilities that cannot be seen in a text editor or through review or testing

or that it maliciously adds maliciously good-looking code that is actively malicious instead. neither of which is informed by how it works

3
0
7

@volpeon *sigh*

People really love to hate any mention and any implementation of (Gen)AI without thinking about it for a few seconds.

Luckily, I switched to Vaultwarden a few months ago, so I don't have to deal with the "uhh but you're still using KeePassXC, you're pro ai!!!!!" shit. KeePassXC is great for what it is. I really liked using it. And I'll probably continue recommending it to people if it fits their needs.

1
0
0

@charlotte @volpeon
I mean it's pretty close to the second one tbh. Like it doesn't have malice cause it's just an approximation of a high-context markov chain (no feelings), but like; by design it creates code that is very good looking, but also kinda bad, and completely lacks thought and understanding of the problems and edge cases that code solves.

1
0
0

@gimmechocolate @volpeon do we ban the use of copy paste due to the lack of awareness in that tool?

2
0
0

@gimmechocolate @volpeon like ultimately it falls onto the developer to verify the work performed by the tools that they use, and to adjust the code accordingly

0
0
0

@volpeon I think there is a difference, because code submitted by an outside contributor is generally understood to be potentially risky.
But being willing to use an AI coding tool *at all* implies a level of trust in the technology. And one of the big issue with this tech is a tendency to be "magical" in the mind of its advocates, which increases risk.
Whether this concern is valid or not is debatable, but these are definitely not the same situation

1
0
0

@errant From what I’ve seen from the developers, they’re well aware of the risks and capabilities of AI. You aren’t wrong that careless (non-)developers are too confident in AI, but this doesn’t imply the inverse: that all users of AI are automatically careless. As long as they consistently demonstrate responsible use of it, I personally see no problem from a security standpoint.

1
0
0

@charlotte @volpeon
So like, first off, you're clearly responding to what you thought I'd say, and not what I actually said, because I didn't even mention lack of awareness.

Second, copy-paste is actually notoriously bad for creating errors, and it's typically considered good practice to use things like constants, macros, and functions instead. And that's again, ignoring the fact that copy-paste doesn't have any of the problems I mentioned in my post.

1
0
0

@SteffoSpieler @volpeon

I mean, there's the OG KeePass2, still maintained by one author who releases a zip of source code with every version instead of doing any modern github stuff.

To me, the overall UX ergonomics of KeePassXC and especially its information density per screen are lacking a lot in comparison.

1
0
0

@Tvorsk @volpeon I liked the design of KPXC and I'm not a fan of the old look of KP2, but that's a thing of personal taste. If you like KP2, then that's absolutely awesome! ayyyy

0
0
0
@charlotte @volpeon yeah the people on that github issue were bizarrely saying things like software is untestable, like you couldn't write a test to verify correctness because the nefarious ai can insert tricks that human cognition weaknesses cannot detect
3
0
3

@gimmechocolate @volpeon feelings imply awareness, no? but copy-paste doesn’t have feelings either.

and like yeah it does create issues. i am aware that it creates issues. that is specifically why i mention it. it is still an occasionally useful tool which can be leveraged by programmers. The person liable for (mis)use is the developer using the tool, and copy-paste errors are also something that may not be immediately obvious in a code review either

1
0
0

@volpeon me trying to explain to people that just because it’s open source doesn’t mean it’s automatically secure.

Just because you can read the source doesn’t mean any single individual has complete understanding of the entire source code and even if that was possible, silence can be bought.

Open source fanatics are clueless sometimes.

0
0
1
@sun @charlotte @volpeon and the actual issue was some basic straightforward JSON import with tests
0
0
1

@volpeon @errant AI is really easy to accidentally prompt in such a way that it goes off the rails, can make all sorts of mistakes and copyright issues (even subtle ones), and can do it at scale. Hypothetically it might be possible to use it in such a way to not trigger these issues, but surely all the checking and prerequisite expertise would nix most advantages of using it?

And it looks like for most AI PRs in KeepassXC, the person working with the AI ultimately approves the code (example: https://github.com/keepassxreboot/keepassxc/pull/12588)... hardly a rock solid review process. Usually, there's two humans in the loop.

1
0
0

@sitcom_nemesis @errant

but surely all the checking and prerequisite expertise would nix most advantages of using it?

Sure, but why would this be a concern for anyone but the user themself? I’m sure I use things which other people may not like, such as VSCode or GNOME. Is it valid for them to tell me what to use and how?

And it looks like for most AI PRs in KeepassXC, the person working with the AI ultimately approves the code… hardly a rock solid review process. Usually, there’s two humans in the loop.

The way the AI is integrated in GitHub makes it a separate entity from the reviewer with an interaction workflow akin to iterating a PR with its author until it matches the project’s standards. In both cases, the PR author — AI or human — is untrustworthy and the reviewer is trustworthy. There are also non-AI PRs where only one developer conducted the review, so there’s no difference between AI and non-AI standards.

If this strikes you as flawed, then your concerns should lie with the review process itself.

1
0
0

@charlotte @volpeon
Yeah, but no feelings wasn't a criticism of LLMs, I was saying it does actually behave more or less like what you said but without the malice part. To be clear it's bad because it makes code that looks good, but isn't — it's just mashes up the semantic ideas from its training data without regard for how those different patterns will interact; the result will always *look* right, but won't necessarily be right, often for very subtle reasons, making it a poor tool that people simply shouldn't use. I wouldn't trust a project that put instructions on how to copy-paste from stackoverflow in their repo either.

As for the tool thing, like, I can see ways where a LLM could be a useful tool. For example, it's very annoying to me how the results from the LSP autocomplete are in some arbitrary order, and it could be nice to have an LLM step in and rank em!! It's a tool, but like, only as a part of a well-designed machine-learning pipeline, made by the people who understand the limits of the technology — not something used by just about everyone almost raw.

The whole thing is just annoying to me though — I've been interested in machine learning for over a decade now, and when I took my first course in it, I was warned about this exact scenario. There's a cycle of AI spring and winter, where the AI business leaders push the idea that these tools can do anything. People metaphorically saying "Uhm, when all you have is a hammer, actually you can solve all your problems by whacking everything with it.". Now I get to see what they were talking about firsthand and it's extremely frustrating to feel like I am shouting it from the rooftops and no one listens.

0
0
1
@sun @charlotte @volpeon i mean in my experience the pattern matching leading to outputting stuff that is very good at looking correct during ocular inspection but isn't correct is very real
0
0
0

@volpeon @errant

The way the AI is integrated in GitHub makes it a separate entity from the reviewer with an interaction workflow akin to iterating a PR with its author until it matches the project's standards. In both cases, the PR author — AI or human — is untrustworthy and the reviewer is trustworthy.

I'd argue that integrating AI into GitHub this way is part of the problem. It's not an agent - it's a word guessing machine with access to an API. It fundamentally doesn't think like a human, trustworthy or otherwise. We have methods of understanding context, intention and trustworthiness with other humans - AI strips that all away while still claiming to be analogous to a human. That's, in part, what makes it so risky.

It's one thing to allow AI code with the caveat that the human needs to take full responsibility (e.g. the Fedora guidelines), but that doesn't seem to be happening with KeepassXC. Hence the concern.

1
0
0

@sitcom_nemesis @errant

It doesn’t matter what the AI is or isn’t. GitHub’s presentation of it leads to the AI having the same status as an external contributor, which incentivizes the thinking that it must be held up to the same standards as human contributors. I’d even say that this is the only healthy approach because implicitly trusting humans more means that in case they use AI outside of GitHub and act like it’s their own work, your own bias may prevent you from seeing flaws you’d pay closer attention to when reviewing an AI’s output.

I’m not sure what the problem with responsibility is. Isn’t this a project governance issue, i.e. “you take full responsibility” means you’ll get kicked out if you contribute garbage AI code? How is this a security problem?

0
0
0

I saw this on my fedi timeline. I don't think people need to flat-out lie just to criticise the use of AI generated code - UZDoom going for the copyright/GPL angle during that whole thing made a lot more sense IMO.

0
0
0

From what I saw, KeePassXC implements the same policy as Mesa and Fedora, so I don't see what the problem is.

That being said, the code of my password manager of choice is proprietary, they could very well be using AI generated code and I wouldn't even know (or care, since I trust the company enough).

0
0
1

@volpeon Yes! Exactly this!

The dev's responses that I have seen give me enough confidence to not lose trust in their use/allowance of AI gen'd code.

They plan to treat ALL AI gen'd code as equivalent to a drive-by PR which requires additional scrutiny.

That's a perfectly reasonable review policy, imho.

0
0
1
@volpeon@icy.wyvern.rip @ngaylinn@tech.lgbt I have been critical of the KeePassXC team's decision to accept LLM-assisted submissions for awhile now, and after long consideration I opted to move away from using it, security being one of my primary concerns. My view is that you're setting up a bit of a strawman in this post, and so I thought I'd elaborate more on my rationale in case it helps anyone who's weighing this decision too. The tl;dr is that code review should be your last line of defense, not your primary one, and that LLM use threatens to erode existing lines of defense while introducing new categories of risk. This is the opposite of what you should be doing when developing security-oriented software.

Here's a lot more words:

To my way of thinking the question isn't whether any given pull request is problematic. One of the KeePassXC maintainers who I've interacted with seemed to suggest this as well, that human beings sometimes submit poor quality pull requests too so what's the difference, especially if the review process catches them? The important question, and the difference, lies with the culture of the development team and process. Experience shows that security-oriented software in particular benefits greatly from a team of people dedicated to both transparency and the relentless pursuit of excellent implementations. My belief is that the use of LLMs in coding threatens both of those aspects, degrading them over time. Transparency, because no one can know exactly what an LLM is going to produce and why, and an LLM cannot tell you anything about its output; excellent implementations because (a) come on, have you ever looked at LLM output, especially for larger chunks of code; and (b) the only way we've ever found to produce excellent implementations of anything is by developing a well-functioning team of people and setting them loose on it.

Peter Naur famously argued that programming is theory building, and theories draw their power from their existence in the heads of the people who construct them. I am convinced by his argument by the simple fact that over the course of my career I've worked with large codebases written by other people, and have experienced firsthand that the only way to really understand the code is by talking to other people who understand the code. No one can look at a large codebase and understand how it works, not even with the best documentation in the world--not in a reasonable amount of time, anyway. Anyone who believes this hasn't picked up a non-trivial APL program and tried to figure out what it does. Anyone who believes this is mistaken about the practice of software development and engineering, and probably also believes in the myth of the 10x engineer or that women can't code as well as men, too.

LLMs are not people. They do not understand code. They cannot describe their thought process to you. They cannot point you to the most important functions, procedures, methods, or objects. They cannot give you hints about pitfalls you might fall into while working with their code. Any understanding like this that arises about LLM-generated code arises because human beings developed that understanding of the code and then communicated it.

LLMs are trained on masses of mediocre code. Their output has been found to include significantly more bugs and security issues than the average human-written code. Their use has been observed to result simultaneously in reduced productivity and a belief that productivity was increased, suggesting they might induce other blindspots in one's self-awareness too. Their use has been observed to result in de-skilling: people become less able to do things they used to be able to do without leaning on the tool. Given all that, I do not believe for a moment that an LLM can produce an excellent implementation, nor foster a culture in which excellent implementations arise; and I believe that any excellent implementation produced by a person using an LLM is a result of the person compensating for the weaknesses and traps of LLM use, all while it potentially degrades their future ability to produce excellent implementations and fools them into believing they wrote better code faster when they did not.

A good review process does not compensate for any of the issues I raise here. More importantly, actual security is about layers of protection. The code review is one of the last layers of protection. There should be many, many others, which to me includes a culture that does not succumb to the temptation to put a stochastic black box deskilling machine into the software development process. You wouldn't build a fortress with an open road leading into the center just because you had guards you could post on the road (it lets us get in and out faster, that portcullis is so slow!). You'd have the guards, and you'd have several layers of thick walls, and you'd have a moat, and you'd have archers, and... You certainly wouldn't voluntarily pull a giant wooden horse that could contain anything into your fortress!

I suspect that a project adopting more and more LLM-assisted submissions will not obviously suffer in the near term, but over the medium to long term is likely to develop issues, originating in one or more of my above observations, that eventually lead to problems in the software. As I said to someone about KeePassXC, I am not inclined to hitch my wagon to that train. Not when it comes to a piece of software like a password manager.

And that's not even opening up the moral and ethical issues of LLMs, which are substantial. Not to mention the dangers of becoming dependent on a technology and tools that might go away or become significantly more expensive when the asset bubble currently necessary for their continued existence finally deflates.

Other people might come to a different place, but for me this is more than enough reason to switch password managers.
2
0
1

@abucci Thanks for your reply! You’re making good points which I overall agree with. I’ve had rather subpar experiences with LLM-generated code at work myself, so it’s not like I don’t see the downsides and how it leads to the erosion of skill. It’s true that this also has implications on the security.

However, from what I’ve seen, I think the way GitHub integrates Copilot into the process makes it less likely to cause the same degradation as an AI assistant directly integrated into an editor. As I said elsewhere, GitHub presents Copilot as a PR author and your usage of it is akin to iterating a PR with a human author until it meets the project’s standards.
If regular PRs don’t pose a risk to one’s skills, then I don’t see why this would. It incentivizes the thinking that the AI must be held to the same standards as any other PR author, that it isn’t inherently above them. I think this is a good way to handle it.
I’m happy to be corrected if my understanding of Copilot or the way the devs use it is wrong. You’re clearly more involved in this topic than I am.

Apart from that, I do wonder how realistic it is to expect projects to reject LLMs contributions forever. No matter what you and I want, the global trend moves towards increasing adoption of AI and this means external contributions will become more and more “tainted”, with and without their knowledge. Given this outlook, I think it’s better to be open for AI contributions. This allows the developers to become familiar with the strengths and weaknesses of AI, and it creates an environment where contributors are willing to disclose their use of it so that reviews can be conducted with appropriate care. An environment where AI is banned will only lead to people trying to deceive the developers and causing necessary trouble.

@ngaylinn

1
0
0

@abucci @volpeon I agree with everything you say. I'm not well versed in this case, and my opinions are more nuanced than the original post, so I'll clarify what I meant by boosting it.

LLMs produce low quality code that nobody has read or understood. This a serious problem, and security is just one of many risks. Better to say "this is sensitive code, I don't trust an LLM to touch it" than to claim AI-generated code is insecure.

The risk of bad / insecure code is not new. It's a key challenge for open source. There are many development practices like code reviews to manage this risk. LLMs make the problem worse, but it's not new. Either we trust the practices, or we must update them, but we should think in terms of how we handle the bad code as it comes.

I'm against AI in software, but banning doesn't prevent it, especially in open source. People will use it. So, the question is, how do we keep software safe and reliable in the face of this? Saying "AI is bad because it's insecure" is too simplistic.

1
0
1

@volpeon I'm glad you said this, I saw a line of reasoning that AI PR's were worse because they're designed to give correct looking things and thus will slip things by unnoticed easier...but like you said, that is the point of rigorous code review. And, if you can't catch those, then that is very much not a good thing in general for a security project accepting PRs anyways.

And the other angle I keep thinking about around this too: this is a tool that can help author tedious work, and could be boon for a volunteer led FOSS project. If it is true they're using it to do the non-critical work that still needs to get done but no one wants to do or help with, that doesn't seem like the worst thing for a project.

0
0
1
@ngaylinn@tech.lgbt @volpeon@icy.wyvern.rip
I'm against AI in software, but banning doesn't prevent it, especially in open source. People will use it. So, the question is, how do we keep software safe and reliable in the face of this?
I differ with you here. Banning will obviously not stop people from trying to submit pull requests with AI-generated code in them. That is a strawman I have not suggested, and I don't understand why people keep bringing it up as if it's relevant. What banning will do is set a clear tone for a project, and unequivocally identify (some) people who do not respect the project's values (if "no AI" is one). It's exactly like putting things like "no bigotry" in a code of conduct. You don't say "no bigotry" because you think doing so will make people stop saying bigoted things (at least I hope folks aren't that naive).

I said in some other thread that (most) software developers take finding bugs seriously, and work hard to eliminate them. This is a cultural practice. I'd say most bugs, at least in mature-ish projects, are not software-breaking, but are subtle behavioral issues. Many of those could be left in without breaking the core functionality of the software. But developers fix them anyway. Why? Because fixing bugs is part of the culture of software development. Not using AI/LLMs could also be made part of the culture, and treated with equal seriousness. But values like that do not emerge spontaneously; people have to advocate for them and practice them. Banning AI use is one way to advocate for this value.

1
0
0
Long post
Show content
@volpeon@icy.wyvern.rip
how realistic
I don't mean to come off as snarky or dismissive at all, but this phrase is so often used to bludgeon good ideas to death. Everyone agrees we should do X, someone pipes up "yes but is that realistic?" and poof, X is off the table. Somewhere in one of those CIA instruction manuals for how to disrupt organizations is the advice to do exactly this (don't have the patience to dig it out).

It's unrealistic to expect people to not murder one another, yet we insist that they do not do that.

Incidentally, have you seen this? https://blobfox.coffee/@Ember/115522745321119751

For some reason that URL is not loading for me right now, but it's a person pointing to a recent KeePassXC pull request that was not reviewed by another person, something the KeePassXC blog post and maintainers insist shouldn't ever happen. The PR is "reviewed" by "Copilot". This is why you ban AI: the slide from "experimenting with AI" to "flat out lying about it" is fast, in my experience. It is absolutely not better to tolerate this kind of thing in general; it is intolerable in security and security-adjacent software.

@ngaylinn@tech.lgbt
0
0
0

@abucci @volpeon Agreed that "no AI" is a totally valid thing to put in your policy, with many benefits even if it doesn't prevent LLM contributions.

However, this re-framing is relevant. I don't mean to make a straw-person of your argument and then shoot it down. I mean to say: we should attend to engineering practices that mitigate the inevitable harm of thoughtless, mass-produced PRs, because we'll need it regardless of policy.

What I liked about the original post was shifting attention from "LLMs insecure" to "how were we mitigating this problem before using software development practices, and how can that inform what we do next?"

Absolutely no arguments against your emphasis on healthy development culture. I think that's the same point, actually! And it's valid to say that a no-AI policy is part of having a healthy dev team, in this moment.

1
0
0
@ngaylinn@tech.lgbt You emphasize a good point. At the level of managing a large inflow of pull requests I can understand the appeal of automating some of the work. For some projects it's definitely a problem to solve.
@volpeon@icy.wyvern.rip
0
0
0