Wednesday, August 06, 2008

Joys of Sendmail

Yesterday I spent a while dealing with some very strange email problems. Starting on Monday, when I got back from Portland, I wasn't able to email from my machine to some other email addresses. For example, I could send email to a gmail account, but anything to coa.edu bounced instantly. The error message was odd and difficult to interpret. I figured that this wasn't a problem on my end, but that it had something to do with the UMaine servers. (UMaine does our internet access, and email to coa.edu goes through one of their big servers.) I queried our helpful network administrator, who queried someone at UMaine. They were convinced that the problem was with my email server, not theirs.

I didn't believe them. But I started investigating. The error message that I was getting was really weird. Something about a "server not found" which is odd, because I could ping all the relevant servers I could think of and everybody was alive. After much googling and pondering I figured out what the problem was. It turns out that I am using a version of sendmail with an odd little bug in it. Here's what happened.

Over the weekend the campus lost power. Not a big deal. This happens relatively frequently and my machine has always come back online without trouble. This time, though, things didn't go so smoothly. Once the power was restored, my machine boots up. As sendmail is bringing itself to life, it needs to figure out who it is -- i.e., what its hostname is. To do so, it looks to a nameserver of some sort of figure out the name of localhost. However, the nameserver, or perhaps the network itself, wasn't back online yet. So the server query takes a while and times out, with the message: "connection timed out; no servers could be reached".

So far, so good. Except my version of sendmail is quite literal. It doesn't recognize this as an error. Rather, it now thinks its hostname is "connection timed out; no servers could be reached". Brilliant. Then, when emailing, my machine was contacting the UMaine server and telling it that my name was "connection timed out; no servers could be reached". Quite understandably, Umaine doesn't like this name, and thus refused to talk to me. The mail bounced instantly, with a misleading message about "no servers could be reached". Some other mail hosts also wouldn't talk to me, but others, such as gmail, would.

Once I figured this out, it took a while to fix but it wasn't too bad. I'm not sure my fix is permanent, however. Things could go nutty again if there's another power outage. Something to look forward to, I suppose.

Adding to the fun is that an email I sent to someone at UC Davis tripped some other warning system, causing me to get listed on the Composite Blocking List. It wasn't too difficult to get myself delisted.

This bug seems to be limited to ubuntu/debian versions of linux. See bug reports here and here if you're curious.

Not how I planned on spending Tuesday afternoon. But at least the problem is resolved and my computer is happy and playing well with others.

2 comments:

Jonathan Shock said...

Such problems remind me of my favourite Chinese restaurant name ever: http://adweek.blogs.com/adfreak/2008/07/then-well-grab.html

Sorry we didn't have a chance to go for that beer in Beijing - I'll try and make it out there again next year.

Jessica B said...

Haha, that's hilarious!! As is your friend's link! :-)
I hope you keep this blog going. I know we're not really in touch this way, or it's most often a 1-way communication, but I enjoy reading your posts! (I think it's similar to Facebook and Myspace - it's mostly a false sense of connection - but I guess it's better than nothing.)