Google Docs and Sheets should Almost Always be Restricted to Defined Users

Somebody sends you a link to a Google Sheet and it just works. It’s magical.
But that magic comes at a cost. I see far, far too many organizations that regularly share Google Documents and Sheets by using the share with “Anyone with the link” option that Google easily provides.

That is almost ALWAYS a bad idea. The convenience of having it shared with anybody is, at the same time, a potential security problem today and in the future.

But that long link with the 44 random-looking characters would be impossible for somebody to guess, right?

Yes. It would be statistically improbable for somebody to just guess a random string of 44 characters that would result in an actual document. It is possible that an attacker could write programs that could guess millions and millions of links to try them until they found some documents that actually exist. But that’s not the most likely weakness.

Consider what happens when you email a for your spreadsheet to somebody else. You have zero control over who accesses it after that. What if the recipient forwards your email with the link to somebody else? Often emails to businesses are forwarded into Customer Relationship Management (CRM) or similar systems where that link is now accessible to many other people in the organization. What if an attacker has access to a recipients email? Or a CRM system? How about if an employee leaves the company and they still have it in a browser history.

In all of those scenarios, and hundreds more that you can’t imagine, if your document is shared with “Anyone with the link”, literally anybody that sees that link can open it and you have absolutely no knowledge that they did.

Always share only with specific email addresses.

Sharing with Google Groups

Sharing with specific people can become a headache to maintain as people change roles. Consider using the Google Groups feature in your organization. You can set up a Google Group for something like ‘client-yourclientname@myorganziation.com’ or ‘team-myteamname@myorganization.com’ and ask to have documents shared with that group instead of individual people. You can then add and remove people from the groups to provide access to only those that are allowed.

See More information about sharing with Groups at https://support.google.com/a/users/answer/9308872?hl=en

Solving ECS Stuck in Pending and Frozen / Stalled ECS Hosts Problems

We’ve had a strange, hard to track-down problem for months now. It has felt like a bug with Amazon ECS, but everything seems to have been working correctly.

The main way that we’ve observed this problem is that ECS would say that it was launching tasks, but they would stay in a “PENDING” state forever. Conversely, when tasks needed to be killed, the desired state would change to Stopped, but the ECS Console would indicate that they were still running. We discovered quickly, that some of our ECS Host Servers would become completely unresponsive. Sometimes with 100% CPU usage, sometimes with near zero CPU Usage. Terminating the instance, and having the Auto-Scaling group recreate it would generally solve the problem, but its never good to have things frozen without understanding why.

Often, the host servers would be completely unresponsive. We were usually unable to SSH into the server to investigate. When able to access them, looked through logs and found it full of failures about being unable to talk to external resources. After diving pretty deep, we figured out that the route table was missing a default gateway. It’s hard to talk to anything when you can only use a local network.

This is an example of a missing default gateway.

[ec2-user@ip-172-31-45-74 ~]$ route
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
172.17.0.0      0.0.0.0         255.255.0.0     U     0      0        0 docker0
172.31.32.0     0.0.0.0         255.255.240.0   U     0      0        0 eth0

On a functioning instance, it should look like this. Notice the destination of 0.0.0.0 with the IP Address to the Default Gateway:

[ec2-user@ip-172-31-39-228 ~]$ route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         172.31.32.1     0.0.0.0         UG    0      0        0 eth0
169.254.169.254 0.0.0.0         255.255.255.255 UH    0      0        0 eth0
172.17.0.0      0.0.0.0         255.255.0.0     U     0      0        0 docker0
172.31.32.0     0.0.0.0         255.255.240.0   U     0      0        0 eth0

It was puzzling how the machine would work for a while, and then its default gateway would disappear.

I’m still not certain how exactly that is happening. However, the system log indicates that there is a period of extremely high load
and it gets frozen for minutes (maybe hours) at a time.

Some of these log entries are indicitive of major delays:

Jan 20 13:26:44 ip-172-31-123-45.ec2.internal crond[21992]: (root) INFO (Job execution of per-minute job scheduled for 13:25 delayed into subsequent minute 13:26. Skipping job run.)

Jan 17 21:20:31 ip-172-31-45-166.ec2.internal chronyd[2696]: Forward time jump detected!

Notice how these logs are out of order too:

Jan 20 13:39:22 ip-172-31-123-45.ec2.internal kernel: R13: 00007faf9dc777a8 R14: 00000000000031f9 R15: 00007faf9dc7d510
Jan 20 13:28:30 ip-172-31-123-45.ec2.internal dockerd[4660]: http: superfluous response.WriteHeader call from github.com/docker/docker/api/server/httputils.MakeErrorHandler.func1 (httputils.go:107)
Jan 20 13:36:03 ip-172-31-123-45.ec2.internal dhclient[3275]: XMT: Solicit on eth0, interval 129760ms.
Jan 20 13:28:30 ip-172-31-123-45.ec2.internal dockerd[4660]: http: superfluous response.WriteHeader call from github.com/docker/docker/api/server/httputils.MakeErrorHandler.func1 (httputils.go:107)

Finally, this may be the thing that ultimately disables the networking. It looks like `oom-killer` killed the `dhclient-script`, which maybe left the network in an very bad state:

Jan 20 15:28:36 ip-172-31-45-74.ec2.internal kernel: dhclient-script invoked oom-killer: gfp_mask=0x14201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD), nodemask=(null),  order=0, oom_score_adj=0
Jan 20 15:28:36 ip-172-31-45-74.ec2.internal kernel: dhclient-script cpuset=/ mems_allowed=0

You can simply run

sudo dhclient eth0

to have it grab the default gateway from DHCP again. But its best to put other memory limits in place to prevent it from running out of resources to begin with.