Avoid loading certificates from system store for every connection#6418
Avoid loading certificates from system store for every connection#6418SReicheltPTV wants to merge 4 commits intonpgsql:mainfrom
Conversation
| if (verifyFull && sslPolicyErrors.HasFlag(SslPolicyErrors.RemoteCertificateNameMismatch)) | ||
| return false; | ||
| // Also check against the default systme certificates location on Debian/Ubuntu. | ||
| if (certRootPath == "/etc/ssl/certs/ca-certificates.crt") |
There was a problem hiding this comment.
FWIW we avoid hard-coding filesystem paths in the source code like this, as different Linux distributions may have their certificate store in completely different places etc.
There was a problem hiding this comment.
I understand. Unfortunately, the SSL_CERT_FILE env var doesn't usually seem to be set. As I wrote in #6417, I'm a bit out of ideas here.
In the meantime, it occurred to me that a completely different alternative could be to just cache the custom certificates similarly to how the system certificates are cached. Then there would no longer be a need to figure out what the path points to. I'll try to come up with something.
There was a problem hiding this comment.
/cc @vonzshik on all this, who actually knows stuff about all the certificate stuff
|
So, from my understanding, in your particular case you had certificate in store and specified in |
Just noting that this seems somewhat unlikely - a physical connection open is quite a heavy thing (with potentially multiple network roundrips etc.), and I'd expect just loading a certificate from disk to be negligible. It may be a good idea to first prove that there's a problem via a benchmark. |
|
One more thing. libpq also supports the special value |
That's correct, though I suspect the problem may not be the double verification of this specific certificate, but the size of the system store if
I'll check whether caching the loaded certificates is sufficient, and will report back on this tomorrow.
While I don't have a good grasp on these details, note that the current Npgsql code ignores the result of the verification against the system certificates if
That sounds useful, but just to clarify: In my case, simply unsetting |
Yes but not exactly, as that while we do throw away the previous verification result, the only thing we do is we add specified certificate to custom store and ask the chain to trust whatever there is, after which we try to build the chain again. That is to say, if there were some other issues unrelated to root certificate not being trusted (for example, getting a completely different root certificate), it'll still return an error. |
I can confirm that caching the certificates fixes the problem we had, and I have updated the PR accordingly.
I agree a benchmark would be useful, but I don't know exactly how to do it. In the web service where we experienced the problem, two factors appear to influence the result:
In that setting, the CPU usage had a rather serious side effect: If many requests reached the service at the same time, all database connection attempts ran into a timeout while validating the certificate chain. As a result, these connections also never made it into the connection pool, so every incoming request caused another certificate validation, making the problem even worse.
I see, thanks. I've removed this change from the PR. |
|
BTW just to state the obvious - IIUC caching certificates would mean that users can't deploy a new certificate without restarting the application. That might not be a problem, and if OpenSSL does this as @vonzshik indicated, then we definitely can do the same, I think. |
Oh, right. We could check the file timestamp if necessary, of course. |
If you're doing I/O to check the file timestamp, and certificate files are generally quite small, then it really would seem weird that just checking the file timestamp is so much faster than just loading the file... Seeing an actual benchmark around all this would be very helpful. |
|
I decided to dive in deeper in regards to OpenSSL's certificate cache, and things are not as straightforward as I thought at first. libpq uses SSL_CTX_load_verify_locations to load root cert in memory, which in the end calls X509_load_cert_crl_file_ex, with no caching in sight. Now, there seems to be some caching involved whenever |
|
@SReicheltPTV I see in your pr you added a reference to runtime issue dotnet/runtime#123058. Reading through it, I see that the dev there recommends to instead use |
Checking the timestamp proved unproblematic in terms of performance. I've added it as a separate commit now so you can decide whether to include it. I haven't done any systematic benchmarking, but for the instance in our low-resource dev environment, the service becomes completely overloaded when a certain number of database connections are opened simultaneously (triggering the certificate validation), and requests time out after 30-45s. (Not sure why it varies; I'm guessing that incoming requests are being blocked at some point.) Normal response times are around 50-300ms. I don't think the numbers can be compared meaningfully, though, because under normal circumstances, when the database connections succeed, they remain in the connection pool for a while.
Oh, I hadn't seen that actually. Yes, they point to https://learn.microsoft.com/en-us/dotnet/core/extensions/sslstream-best-practices#custom-certificate-trust, but that page doesn't seem to address your specific objection. I'm not an expert on this at all unfortunately. Anyway, I have rebased onto that commit again as you suggested. |
|
Hi, did you have time to look at the latest changes? Thanks. |
|
@SReicheltPTV not yet, should have some time tomorrow. |
| certificateChainPolicy.TrustMode = X509ChainTrustMode.CustomRootTrust; | ||
|
|
||
| var certs = new X509Certificate2Collection(); | ||
| certificateChainPolicy.ExtraStore.AddRange(certs); |
There was a problem hiding this comment.
ExtraCerts are treated as untrusted during validation, so putting root certs here does not achieve anything. Putting them to CustomTrustStore should be enough.
ExtraCerts is supposed to be used for intermedate certs (that are issued/signed by one of the trusted root certs, either via system store or CustomTrustStore). ExtraCerts are only useful in cases when client does not send the intermediates on it's own (TLS RFC states that peers should send the entire cert chain minus the root cert, intermediates can be omitted only if peers know the other side already has their copy stored locally)
There was a problem hiding this comment.
I did not introduce this code in this PR though, I only moved it here from SslRootValidation. It was last changed in commit c6742e5; before that only ExtraStore was used.
I can remove it of course, but that's a bit beyond the original purpose of this PR.
Looks like our documentation could be improved to be more helpful :) |
NinoFloris
left a comment
There was a problem hiding this comment.
Thanks for taking a stab at this.
That being said, seeing static caches with security related data immediately makes me uneasy, especially if things can end up being cached forever against a null timestamp.
I think it's OK to cache this work. For that to happen though it must have a modified timestamp (otherwise, do not cache), additionally the cache belongs on the data source instead. This allows for any loading or invalidation issues to be worked around by creating a new data source.
After commit 7320260, it is no longer necessary to perform the validation twice against the two different certificate stores.
Thanks, I've implemented those two changes now. |
|
FYI, I won't have time to look at the failing tests for the next two weeks. I haven't been able to run them locally yet. |
Cache loaded custom root certificates, in particular if the
PGSSLROOTCERTenvironment variable is set. This fixes #6417.Original text: