Temporarily Repair Your Home Internet After a Cable is Cut

I’ve been having some landscaping work done in my yard, and despite having all of the utility lines marked, the crews have managed to cut my cable internet line on several occasions. Since I work from home, and we don’t have hardly any cellular reception at the house, it was pretty devastating. I couldn’t even call the cable company to repair it without driving somewhere with better reception.

So when they said it would be a few days to get a technician to come and repair the cut cable, I decided to try out some repairs on my own. I happen to have some coax crimpers, spare cable, and ends at home, so I first attempted to terminate the cut ends, but the direct-burial cable that is used outdoors is quite a bit thicker than indoor cable, so my ends and crimpers wouldn’t work.

Without any of the correct tools, I was left with just the most primitive of methods: simply twist the center wire together with some needle nose plier, and tie them together with wire ties.

Here’s one of my first attempts when they cut the coax. I tied it into my own coax and which ran back to the house. On this first attempt, I tried to leave some extra shielding and twist that together from each end.
Black Coax Wire Tied in Grass

A subsequent cut, with newer cable had enough slack that I could just tie the two ends together directly. After it was repaired, this is what the technician left so you can see how I first twisted the ends together as much as I could with some pliers before adding a wire tie onto it. I just cut the shielding clean off and didn’t attempt to mess with it, which still resulted in it working fine.

Orange Outdoor Coax Twisted

Orange Outdoor Coax Wire-Tied

It didn’t result in the full 200 Mbps+ speed that I should be getting, but 50+ Mbps was absolutely better than nothing for the few days until the technician could come and re-terminate the ends properly:

Speed Test - 66 Mbps down, 10 Mbps up

Installing snmpd on Ubiquity Dream Machine Pro

I was surprised that the Ubiquity Dream Machine Pro doesn’t have SNMP available. I recall that there was an option to enable it in older versions of their software, but the current 3.0.20 version doesn’t even have an option to enable it (and I don’t think that it worked correctly in previous versions).

Fortunately, its basically just a Debian machine, so you can enable it yourself! These are the steps that I took to enable snmpd so that I could add it to my network monitoring system:

First, update the respositories and install the snmp and snmpd packages:

apt update
apt install -y snmp snmpd

Then, you have to edit the snmpd.conf file in /etc/snmp/snmpd.conf and change these two lines from the View section. This change makes it so that instead of providing information only about the host system, it provides information about all of the attached interfaces as well.

view   systemonly  included   .1.3.6.1.2.1.1
view   systemonly  included   .1.3.6.1.2.1.25.1

To these two lines (note you remove the final .1 from the end of each).

view systemonly included .1.3.6.1.2.1
view systemonly included .1.3.6.1.2.1.25.1.1

Also, you’ll probably want to configure the snmpd deamon so that it will be available on a local network interface, so change the agentaddress line to this (obviously, with your box’s IP address if it isn’t 192.168.0.1):

agentaddress  127.0.0.1,[::1],192.168.0.1

Then restart the snmpd deamon

service snmpd restart

You can test that it is working by running snmpwalk with a command like this:

 snmpwalk -Os -c public -v 2c 192.168.0.1

Which should output hundreds of lines of stuff that start out similar to this:

brandon@auvik:~$ snmpwalk -Os -c public -v 2c 192.168.0.1
iso.3.6.1.2.1.1.1.0 = STRING: "Linux dream-machine-pro 4.19.152-ui-alpine #4.19.152 SMP Thu Apr 6 21:41:48 CST 2023 aarch64"
iso.3.6.1.2.1.1.2.0 = OID: iso.3.6.1.4.1.8072.3.2.10
iso.3.6.1.2.1.1.3.0 = Timeticks: (377603) 1:02:56.03
iso.3.6.1.2.1.1.4.0 = STRING: "Me "
iso.3.6.1.2.1.1.5.0 = STRING: "dream-machine-pro"
iso.3.6.1.2.1.1.6.0 = STRING: "mycommunity"
iso.3.6.1.2.1.1.7.0 = INTEGER: 72
iso.3.6.1.2.1.1.8.0 = Timeticks: (0) 0:00:00.00
iso.3.6.1.2.1.1.9.1.2.1 = OID: iso.3.6.1.6.3.10.3.1.1
iso.3.6.1.2.1.1.9.1.2.2 = OID: iso.3.6.1.6.3.11.3.1.1
iso.3.6.1.2.1.1.9.1.2.3 = OID: iso.3.6.1.6.3.15.2.1.1
iso.3.6.1.2.1.1.9.1.2.4 = OID: iso.3.6.1.6.3.1
iso.3.6.1.2.1.1.9.1.2.5 = OID: iso.3.6.1.6.3.16.2.2.1
iso.3.6.1.2.1.1.9.1.2.6 = OID: iso.3.6.1.2.1.49
iso.3.6.1.2.1.1.9.1.2.7 = OID: iso.3.6.1.2.1.50
iso.3.6.1.2.1.1.9.1.2.8 = OID: iso.3.6.1.2.1.4
iso.3.6.1.2.1.1.9.1.2.9 = OID: iso.3.6.1.6.3.13.3.1.3
iso.3.6.1.2.1.1.9.1.2.10 = OID: iso.3.6.1.2.1.92
iso.3.6.1.2.1.1.9.1.3.1 = STRING: "The SNMP Management Architecture MIB."
iso.3.6.1.2.1.1.9.1.3.2 = STRING: "The MIB for Message Processing and Dispatching."

If that works, congratulations! You’ve got snmpd installed on your Ubiquity Dream Machine Pro. Your network monitoring system may take a little time for it to notice that SNMP statistics are now available on the device.

Note that upgrading the device will probably lose these configs and they’d have to be re-done.

Understanding and Fixing PHP Warning: Packets out of order. Expected 1 received 0. Packet size=145

In one of my applications, I’ve been noticing this error occurring more frequently.

PHP Warning: Packets out of order. Expected 1 received 0. Packet size=145

When investigating, I ran this long running command in the foreground and watched for a pattern. Sure enough, I found that when the program waited a long time between jobs, that the first command when it resumed would result in this error.

My application had some retry logic built-in, so that it resumed and went on as normal, so it was just an annoyance, but I don’t like it when I don’t understand how things are working.

I was able to recreate this problem reliably with this short script:

<?php
require_once 'include.php';   // Connects to the database

// Set the session wait_timeout to a small value
$db->query("SET session wait_timeout=10;");

// Prove that the connection works
$one = $db->getOne("SELECT 1");
echo "Got one = {$one}\n";

// Sleep for longer than the wait_timeout
sleep(11)

// Retry the query
$one = $db->getOne("SELECT 1");
echo "Got one = {$one}\n";

When executed, it provided this output, concluding that the wait_timeout is the problem:

got one = 1
PHP Warning:  Packets out of order. Expected 1 received 0. Packet size=145 in /path/to/myapp/db.class.php on line 68
PHP Stack trace:
PHP   1. {main}() /path/to/myapp/dbtest.php:0
PHP   2. db->getOne($sql = 'SELECT 1', $args = *uninitialized*, $recurse = *uninitialized*) /path/to/myapp/dbtest.php:13
PHP   3. PDOStatement->execute($params = []) /path/to/myapp/db.class.php:68

To prevent this problem, I implemented a timer that counts the time between queries and reconnects to the server if wait_timeout seconds elapses between queries. This may not be exact, because it counts the time between the start of the query, but it largely prevented this problem.

In my database connection class (db.class.php), it calls the conn() method for each query, so I added the timer here which causes it to disconnect when there is more than $sqlTimeout seconds between SQL queries

class db
{
    protected $lastActivityTs = null;
    static protected $sqlTimeout = 3600;  // Make sure you copy this value from your MySQL Server

    public function conn()
    {
        if (isset($this->dbh) && (microtime(true) - $this->lastActivityTs) >= self::$sqlTimeout) {
echo "Disconnecting after expired SQL connection\n";
            // Our connection is probably timed out by the server anyway
            $this->disconnect();
        }
        if (!isset($this->dbh)) {
            $this->_connect();
        }
        $this->lastActivityTs = microtime(true);
        return $this->dbh;
    }

Note that our library here automatically retries once when a connection error occurs. This has also been important to catch temporary failures and disconnects from the MySQL server and have it retry the connection.

    // Continuing in the db class
    public function getOne($sql, $args = [], $recurse = true)
    {
        try {
            $sth = $this->conn()->prepare($sql);
            $sth->execute($args);
            $sth->setFetchMode(PDO::FETCH_NUM);
            $row =  $sth->fetch();
            return $row[0] ?? null;
        } catch (PDOException $e) {
            if ($recurse && 'HY000' == $e->getCode()) {
                // SQLSTATE[HY000]: General error: 2013 Lost connection to MySQL server during query
                unset($this->dbh);
                return $this->getOne($sql, $args, false);
            }
            throw $e;
        }
    }

Migrating 1.2 TB Database From Aurora to MySQL

We have one database server that is running on an old version of Aurora based on MySQL 5.6. AWS is deprecating that version soon and it needs to be upgraded, so I have been working on replacing it. Upgrading the existing 5.6 server to 5.7, then to 8.0 isn’t an option due to an impossibly huge InnoDB transaction history list that will never fix itself. Plus, I want to improve a couple of other things along the way.

I made several attempts and migrating from Aurora 5.6 to Aurora 8.0, but during that process, I grew tired of Aurora quirks and costs. Here are some of my raw notes on what was an embarrassingly long migration of a database server from Aurora to MySQL. Going from MySQL to Aurora took just a couple of clicks. But converting from Aurora back to MySQL took months and a lot of headaches.

TLDR: Along the way, I tried Using Amazon’s Database Migration Service, but eventually gave up for a good old closely monitored mysqldump and custom scripts.

I had a few goals/requirements:

  • Get rid of or soon-to-be-deprecated Aurora instance based on MySQL 5.6
  • Stop Paying for Storage IOPS (often over $100/day)
  • Convert tables from utf8mb3 to utf8mb4
  • Minimal downtime or customer disruption. Some disruption during low-usage times is okay.

A new MySQL 8 instance with a GP3 storage volume and the recently announced RDS Optimized Writes means that MySQL should be able to handle the workload with no problem, and gets this server back into the MySQL realm, where all of our other servers are, and with which we are more comfortable.

Attempts at using AWS Database Migration Service (DMS)

This service looked promising, but has a learning curve. I eventually gave up using it because of repeated problems that would have taken too much effort to try and resolve.

First attempts:
On the surface, it seems like you configure a source, configure a destination, and then tell DMS to sync one to the other and keep them in sync. It does this in two Phases: the Full Dump, and the Change Data Capture (CDC). I learned the hard way that the Full Dump doesn’t include any indexes on the tables! This is done to make it as fast as possible. The second, CDC Phase, just executes statements from the binary log, so without indexes on a 400+G table, they take forever and this will never work.

I also concluded that one of our 300+GB tables can actually be done in a separate process, after the rest of the data is loaded. It contains historic information that will make some things in the application look incomplete until it is loaded, but the application will work with it empty.

Second attempts:
Used DMS for the full dump, the configured it to stop after the full dump, before starting the CDC Process. While it is stopped, I added the database indexes and foreign keys. I tried this several times with varying degrees of success and trying to minimize the amount of time that it took to add the indexes. Some tables were done instantly, some took a couple hours, and some were 12+ hours. At one point I had figured it would take about 62 hours to add the indexes. I think I got that down to 39 hours by increasing the IOPS, running some ALTER TABLES in parallel, etc.

After indexes were added, I started the second phase of DMS – the Change Data Capture is supposed to pick up in time where the Full Dump was taken, and then apply all of the changes from the Binary Logs to the new server. That process didn’t go smoothly. Again, the first attempts looked promising, but then the binary logs on the server were deleted, so it couldn’t continue. I increased the number of days that binary logs were kept, and made more attempts, but they had problems with foreign key and unique constraints on tables.

The biggest problem with these attempts was that it took about 24 hours for the data migration, and about 48 hours to add indexes. So each attempt was several days effort.

Third and last attempts at using DMS:
After getting pretty familiar DMS, I ended up creating the schema via `mysqldump –no-data` then manually editing the file to exclude indexes on some of the biggest tables that would cause the import to go slow. I excluded the one large, historic table. My overall process looked like this:

  • code>mysqldump –defaults-group-suffix=dumpschema –no-data thedatabase |sed “s/utf8 /utf8mb4 /” | sed “s/utf8_/utf8mb4_/” > /tmp/schema-limited-indexes.sql
  • Edit /tmp/schema-limited-indexes.sql and remove foreign keys and indexes on large tables
  • cat /tmp/schema-limited-indexes.sql | mysql –defaults-group-suffix=newserver thedatabase
  • On the new server, run ALTER TABLE the_historic_table ENGINE=blackhole;
  • Start DMS process, make sure to have it stop between Full Load and CDC.
  • Wait ~24+ hours for Full load to complete
  • Add Indexes back that were removed from the schema. I had a list of ALTER TABLE statements to run, with an estimate time that each should take. That was estimated at 39 hours
  • Start second Phase (CDC) of the DMS Task
  • Wait for CDC to complete (time estimate unknown. The faster the above steps worked, the less it had to replay)

Unfortunately, a couple of attempts at this had the CDC phase still fail with Foreign key constraints. I tried several times and don’t know why this happened. Finding the offending rows took many hours since the queries didn’t have indexes and had to do full table scans. In some cases, there were just a few, to a few-dozen rows that existed in one table without the corresponding row in the foreign table. Its as if the binary log position taken when the snapshot was started was off by a few seconds and the dumps of different tables were started at slightly different positions.

After several attempts (taking a couple weeks), I finally gave up on the DMS approach.

Using MySQL Dump

Using mysqldump to move data from one database server to another is a process I have done thousands of times and written many scripts around. It is pretty well understood and predictable. I did a few trial runs to put together this process:

Temporarily Stop all processes on the master server

  • Stop all background processes that write to the server
  • Change the password so that no processes can write to the master
  • Execute SHOW BINARY LOGS on master and note the last binary log file and position. Do this a few times to make sure that it does not change. (Note that this would be easier if RDS allowed FLUSH TABLES WITH READ LOCK, but since it doesn’t, this process should work.

Dump the schema to the new server

This has the sed commands in the middle to convert the old “utf8” colations to the desired “utf8mb4” versions. When dumping 1TB+ of data, I found it helped performance a bit to do the schema changes with the sed commands first. That way the bulk of the data doesn’t have to go through these two commands.

  • mysqldump --defaults-group-suffix=dumpschema --no-data thedatabase |sed "s/utf8 /utf8mb4 /" | sed "s/utf8_/utf8mb4_/" | mysql thedatabase
  • .my.cnf contains this section with the relevant parameters for the dump
    [clientdumpschema]
    host=thehostname.cluster-czizrrfoedlm.us-east-1.rds.amazonaws.com
    port=3306
    user=dumper
    password=thepassword
    ssl-cipher=AES256-SHA:DHE-DSS-AES256-SHA:DHE-RSA-AES256-SHA
    quick
    compress
    set-gtid-purged=OFF
    max_allowed_packet=1024M
    single-transaction=TRUE
    column_statistics=0
    net_buffer_length=256k
    

Move the data

To move the data, I ran this command. Note that it starts with time so that I could see how long it takes. Also, it includes

time mysqldump --defaults-group-suffix=dumpdata --no-create-info thedatabase | pv |mysql thedatabase

My .my.cnf contains this section for the import

host=thehostname.cluster-czizrrfoedlm.us-east-1.rds.amazonaws.com
port=3306
user=dumper
password=thepassword
ssl-cipher=AES256-SHA:DHE-DSS-AES256-SHA:DHE-RSA-AES256-SHA
quick
ignore-table=thedatabase.the_big_table
compress
set-gtid-purged=OFF
max_allowed_packet=1024M
single-transaction=TRUE
column_statistics=0
net_buffer_length=256k

Note that the above command includes the linux pv in between which is a nice way to monitor the progress. It displays a simple line to stderr that allows you to see the total transfer size, elapsed time, and current speed.

266.5GiB 57:16:47 [ 100KiB/s] [             <=>         ]

I experimented with several values for the NET_BUFFER_LENGTH parameter by dumping the same multi-GB table over and over with different values for NET_BUFFER_LENGTH. The size of this value determines how many values are included in the INSERT INTO statement generated by mysqldump. I was hoping that a larger value would improve performance, but I found that larger values slowed down. I found the best value was to use 256k.

NET_BUFFER_LENGTH value Elapsed Time
64k 13m 44s
256k 8m 27s
256k 7m 20s
1M 10m 23s
16M 11m 32s

After Migration is Started

After the mysqldump has been started, I re-enabled traffic back to the master server by setting the password back to the original. I kept all background jobs disabled to minimize the amount of data that had to be copied over afterwards.

Final attempt to use DMS

After the mysqldump was finished, I attempted to use the DMS Change Data Capture process to copy over the data that had changed on the master. You can start a Database Migration Task that begins at a specific point in the Master Log position. Maybe. I tried, it, but it failed pretty quickly with a duplicate key constraint. I gave up on DMS and figured I would just move over any data needed manually via custom scripts.

Other findings

In attempting to maximimize the speed of the transfer, I attempted to increase the IOPS on the GP3 volume from its base level of 12,000 to 32,000. Initially that helped, but for some reason I still don’t understand, the throughput was then limited very strictly to 6,000 IOPS. As seen in the chart below, it bursted above that for some short parts, but it was pretty strictly constrained for most of the time. I think this has to do with how RDS uses multiple volumes to store the data. I suspect that each volume has 6,000 capacity, and all of my data was going to a single volume.

RDS IOPS
RDS IOPS Maxed at 6,000

That concludes the notes that I wanted to take. Hopefully somebody else finds these learnings or settings useful. If this has been helpful, or if you have any comments on some of the problems that I experienced, please let me know in the comments below.

Should I Migrate From AWS Aurora back to MySQL?

5+ years ago one of my companies launched a product that is effectively a search engine monitoring tool. Is saves a lot of information about search engine results and the destination pages, then allows the users to see for which search phrases each pages ranks.

The workload is heavily write intensive. No matter the number of users we have to perform a bunch of data collection and save that into our database. A large increase in the number of users would increase the amount of reads, but the base workload of collecting all of the results remains the dominant workload for the database server.

We built this originally using MySQL 5.6, which we had used and managed extensively. We began having concerns with write capacity about the time the that AWS was starting to push Aurora as an alternative to MySQL, with cost and performance benefits. It seemed like an easy win, so we clicked the couple buttons and within minutes our database server was converted from MySQL to Aurora.

Things worked well for a long time. The product worked well and customers liked it. We tweaked the application here and there, but most of the base functionality just continued to do its thing. We moved on to developing other products and maintaining this one.

Fast forward a few years and we found that minor complaints had started to pile up. We add some indexes, make some code and queries more efficient. Adding indexes or altering a 500Gb table has some challenges, but there are tools like pt-online-schema-change that make table changes a little easier without downtime.

As time went on, we got better about allocating costs to the each product that we run and I did start to notice that the cost to run the Aurora instance was quite high. The instance cost itself was predictable, but the pricing of Aurora Database Storage includes a seemingly small cost of $0.20 per million I/O requests that was adding up to sometimes $200+ per day! It was at this point that I started to call Aurora a “Pay for Performance” product. Because it had the ability to scale I/O capacity quite high, inefficient queries executed fast enough not to notice. You just get charged more for them! It can be difficult to track down inefficient queries when everything is running fast. Performance Insights was very helpful to track down queries that could be optimized. By adding some indexes, we reduced our Database I/O and got our I/O costs down to under $100/day. On a traditional MySQL instance, with more limited I/O Capacity, these queries would have been more obvious, as they would have executed more slowly and our traditional troubleshooting would have brought them to our attention for the same optimizations. The “pay for performance” aspect of Aurora kept us from fixing the inefficient queries because they were hidden by being charged more.

Comparing Aurora Billed IO’s to MySQL IOPS

In November 2022 AWS announced that GP3 volumes are now available for RDS instances. The public documentation mentions a 3,000 IPS base capacity but doesn’t mention that for 400G+ volumes, that AWS actually spreads your storage over four volumes for theoretical base 12,000 IOPS. For an additional $0.02/IOPS you can increase your capacity up to 64,000 IOPS. So on the high end, the extra 52,000 extra IOPS at $0.02 comes to $1,040/month or about $35/day. There may be additional throughput needed as well, but for our workload, I found that IOPS was the bottleneck more than throughput.

Since we were still paying $60-$100 most days for Aurora Storage IOPS, it makes sense cost-wise to switch back from Aurora to MySQL. I also favor MySQL because it’s what we’re already used to. I’ve always thought that the monitoring and metrics available on Aurora instances wasn’t up to par with the MySQL instances. And there is just enough of a “black box” in Aurora that it makes things difficult.

In trying to estimate how much IOPS we needed if we switch back to MySQL, I found it a bit of work to estimate how much Aurora was using in terms that I’m used to seeing for MySQL. The only IO metrics available are “[Billed] Volume Read IOPS” and “[Billed] Volume Write IOPS”. These are under the “Cluster” in Cloudwatch Metrics and look like they are billed at 1-hour granularities. Make sure to use the “Sum” statistic instead of “Average” or else you will be off a lot! My server had values values of around 4,000,000 to 13,000,000 for reads and 5,000,000 to 15,000,000 for writes. These values lined up pretty well to costs per day that I was able to see in Cost Explorer. When Cloudwatch Metrics showed a combined 500M IO’s for a day, I was charged $100. To convert the “Billed IOs” that Aurora reports, you have to divide by the number of seconds in the period. If looking at one-hour period, the 9,000,000 IO’s averages out to 2,500 IOPS (divide by 3600 seconds). 30,000,000 IO’s in an hour equates to an average of 8,333 IOPS.

AWS Aurora Billed Read and Write volume

Note that these are averages over an entire hour, so peaks within that hour could be dramatically higher! This gave me confidence that the 12,000 baseline IOPS and availability to pay for up to 64,000 IOPS with GP3 volumes should be able to perform the same workload that was being handled by Aurora.

The effect of Double-Writes
Also, announced in the past month was support for RDS Optimized Writes on newly launched instances within certain instance types. Its unclear if Aurora already has this type of feature enabled, so I’m not certain if the Billed Aurora IO Writes mentioned above would be the number calculated from there, or potentially half of that. Please let me know in comments below if you know, and I’ll update here once I’ve experimented and been able to tell.

How we saved over $700/month by switching from Carta to Google Drive

Carta is the Gold Standard for startups to keep their CAP Table, but at a price.

One of my companies hasn’t really raised any money, but we have a 50+ stakeholders do to a merger and employee options. We execute maybe 2-3 documents per year related to capital. So the $8,400 annual price of Carta cost us about $4,000 per transaction that we did. Obviously, that is absurd.

We ended up downloading all of the reports and PDFs of all existing options. And added some instructions for what we need to do when new options are granted, exercised, etc. We save the CAP table and related documents in a Google Drive (that we already pay for), and ended up saving $8,400+ per year!

I understand that there are a few other things, such as 409A valuations and peace of mind that come with having a professional software like Carta manage your CAP table, but the savings, for us, are an easy trade-off.

How Do Clients Securely Connect to SSL & HTTPS Servers?

This question arose from Steven Chu on my previous post about MySQL SSL Connections without Client Certificates. How is the client able to securely connect to a server using SSL if it doesn’t already know or trust the Server Certificate?

It is important to understand that there are a few different, interrelated topics here. All of these involve SSL and certificates, but in differing ways, so they are often conflated. Secure communication over SSH shares the same concepts, but has different mechanisms.

  1. Encryption of the traffic between client and server.
  2. Verification that the server is who the client believes it to be.
  3. Authentication of the client to the server.

For SSL and HTTPS communication, the first two concepts are accomplished together because there is no point in communicating securely with a remote party if you can’t verify that the remote party is who they claim to be and that there isn’t a “Man in the Middle” able to intercept the secure traffic.

You actually communicate securely with unknown servers all of the time. When you loaded this web page, your browser didn’t know anything about www.brandonchecketts.com beforehand. Same thing when you load your bank’s website. You never configured your browser specifically to trust their website. So how is it able to verify that it is actually your bank, and not an attacker who is impersonating your bank?

Certificate Authorities

Anybody can create an SSL Certificate with any name on it. In my SSL Certificate Notes post, you can find instructions for creating an SSL certificate. Note that you simply type in the name for the certificate. So you could attempt to create a certificate for any host you care to try. However, an essential part of the process is the Signing of the Certificate. You can self-sign a certificate with any name on it. But if you want your certificate to be recognized and trusted by anybody else, you need to have a recognized Certificate Authority (CA) sign it. If you were to try to create a certificate for www.google.com, no Certificate Authority is going to sign that since you can’t validate that you are authorized to create certificatesnobody else in the world is going to trust it.

When a Certificate Authority signs a certificate, it is their job to verify that the certificate owner is who they claim to be in some way. On the public internet, that is largely done through DNS or Email validation. For example, on this site, I use a certificate issued by Amazon Web Services. In order obtain that certificate, I had to verify that I own the domain. Since the domain is also hosted at AWS, it is quite easy for me to create the DNS records for verification, and AWS can validate it within seconds. I couldn’t, for example, validate a certificate that was for ‘www.google.com’. I’d be unable to validate it with any certificate authority since I can’t make the required DNS entries or receive emails to the required email addresses for google.com.

Extended Validation certificates, offered by some Certificate Authorities, and recognized by some web browsers with a different color banner, often have additional verification steps other than just DNS or Email.

Intermediate Certificates and Multiple layers of Certificate Authoritiees

When a certificate is signed by a recognized Certificate Authority, your client can trust it, because it trusts the Certificate Authority. On the public Internet, most of the time there are multiple layers of Certificate Authorities.

On OSX, you can find the list of root certificates it trusts in the “Keychain Access” system app, in the “System Roots” section. On an Ubuntu or Debian Linux system, the trusted certificates are files that exist or are symlinked in `/etc/ssl/certs`. These systems have dozens to hundreds of certificates that they trust. Look closely and you’ll see that most of them expire 10+ years into the future. These “Root” certificates are highly protected and usually don’t directly sign certificates. The Certificate Authority will often delegate access to intermediate authorities with their own keys that can further sign certificates.

In Chrome, you can click the lock icon next to the URL, and find details about the certificate, including the intermediate certificates. As of the writing of this post, you can see that the SSL Certificate issued to brandonchecketts.com is issued by “Amazon”, which is trusted by the root certificate named “Amazon Root CA 1”. I can find that root certificate in the list of certificates trusted by my OSX system.

Client Authentication

I mentioned the third step above about the client authenticating to the server. In many cases, like this website, there is no need for the client to authenticate to the server since the content is public and intended to be viewed anonymously. If authentication is required, for instance to create a new post, then I simply log in with a username and password entered of the HTTPS connection. Same as you do every day.

Many well-meaning articles about generating SSL Certificates for services other than HTTPS often mention creating an SSL client certificate. The Client Certificate is then provided to the server so that the server can validate the client is who they claim to be. The Client Certificate is simply an alternate (often thought of as “more secure”) method of authenticating than a username and password, or sometimes even in addition to a username and password. In practice, I’ve seen that usernames and passwords transmitted over an encrypted connection are very common, well understood, and just as secure as using an SSL Client Certificate.

Silly Security: TreasuryDirect.gov is the worst website ever

I saw some content today about savings bonds having a great interest rate. So I tried to sign up. I didn’t know I was going to waste an hour to simply create an account. This has to be the worst website I’ve ever seen.

Somewhere in the middle of the process, after entering a fantastic password generated by my password manager, to log back into the site, I was presented with this virtual keyboard. You are forced to enter your password using the virtual keyboard by clicking on the keys. Entering 40 random characters by clicking on the image is SUPER TEDIOUS.

Not to mention, it took me about 10 attempts to enter the password correctly. I didn’t notice it until getting extremely frustrated, but clicking a button on the virtual keyboard will sometimes double-click the character.

After getting into the site, any attempt to navigate using the browsers forward/back buttons will immediately log you out. As will an accidental double-click on any of the navigation.

It’s a good thing they have a monopoly on savings bonds, because nobody would try to use this and stay sane!

Silly Security: Don’t Show Me The Secret, Then Confirm I Have It!

I just received a replacement credit card from Health Equity because my previous card is expiring. Their validation screens made me laugh.

The first screen shows the card you are replacing, and includes the last four digits of the card.

 
Then the following screen asks for the last four digits of the card number “In order to verify possession”.

You probably shouldn’t tell me the last four digits before asking me to confirm that I have the card.

Make Sure You Are Calculating Net Promoter Score Correctly

The Net Promoter Score can be a pretty valuable metric for determining customer happiness, and, more importantly, how likely your customers are to tell other people about your product.

The basic idea is that you ask customers how likely they are to recommend your product to someone. Those who respond as a 9 or 10 are considered “Promoters”. When asked about your product, they’ll respond positively and encourage others to use your product as well. Customers who answer with a 7 or 8 are satisfied, but not likely to talk positively about your product. Customers who answer with a six or below are considered “detractors”. When asked about your product, they’ll respond negatively, detracting from your reputation. If you have a higher number of “promoters” than “detractors”, then your NPS Score will be positive. More detractors than promoters will result in a negative NPS score.

There is an excellent tool for calculating your Net Promoter Score at Delighted.com that helps to visualize this.

I was recently meeting with a leadership team and they mentioned that their Net Promoter Score was 6.6. That’s not a great score, but its not terrible. I don’t usually hear it expressed as a decimal, but I didn’t think much of it. After meeting with the team after several months, they kept mentioning NPS Score with a decimal and it had increased to 6.7. It was then that I began to ask questions into how they were calculating that. It turns out it was a simple average on a rating from 1 to 10. That is NOT an NPS Score! If anybody ever tells you their Net Promoter Score is between 1 and 10, make sure to dig in and make sure they are calculating it correctly! Scores should range from -100 (All detractors) to +100 (All promoters).

When calculated correctly, this product’s NPS score was actually negative. That helps to explain why revenue growth has been a challenge and marketing dollars are not moving the needle as they’d like.

Contrast that with another organization I meet with regularly. They calculate their NPS Score correctly and it’s a 60! No wonder this company has incredible growth and is doing well.

While your NPS score is negative, your first priority should be fixing the product and customer experience. Otherwise, every customer that signs up is likely going to detract from others using your product.