With Pith

Ethan Petuchowski

Some Java Network Programming Fundamentals

Most of what I’ve learned and discussed here comes from TCP/IP Sockets in Java, a highly recommended book about this stuff by Calvert and Donahoo. Some of it also comes from Java Network Programming by Elliotte Rusty Harold.

Overview

  • The only transport-layer protocols Java supports are TCP & UDP; for anything else, you must link to native code via the Java Native Interface
  • TCP uses stream sockets, through which one generally just writes to an OutputStream and reads from an InputStream of bytes that remain in-order and uncorrupted and are (practically) guaranteed delivery by the implementation of the protocol by the operating system.
    • Unless you’re using NIO; see below for more on that
  • UDP uses datagram sockets, through which you send and receive objects called DatagramPackets, which are just a length, a destination, and data
  • Unless you’re using NIO, everything blocks: e.g. connecting to servers, listening for clients, reads, writes, and disconnecting (for TCP)
    • By default most of these actions may block indefinitely
    • For reading and connecting, you can configure a timeout, after which you will receive an InterruptedIOException
    • For writing to a TCP stream, you cannot configure a timeout

Handling multiple clients

  • Deal with one at a time, which is simplest, especially if there’s some state that is shared by all potential clients. Speed may become problematic quickly.
1
2
3
4
5
6
7
8
9
10
11
void mainLoop() {
    while (true) {
        Socket s = serverSocket.accept();
        handle(s);
    }
}
void handle(Socket s) {
    InputStream in = s.getInputStream();
    // process request, etc.
    s.close();
}
  • Create a new thread to handle each incoming client. This is still pretty simple, but will lead to massive overhead if you have many concurrent clients, and therefore you’re context-switching all the time.
1
2
3
4
5
6
7
8
9
10
11
12
void mainLoop() {
    while (true) {
        Socket s = serverSocket.accept();
        new Thread() {
            @Override public void run() {
                InputStream in = s.getInputStream();
                // process request, etc.
                s.close();
            }
        }.start()
    }
}
  • Use a thread pool to handle requests. Java has abstracted the thread pool concept into the Executors factory class. There are a multitude of executors to choose from. This newCachedThreadPool() one will execute each task on an existing thread if one is idle, and will create a thread otherwise. Threads sitting idle in the cache for over one minute are terminated.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
ExecutorService executor = Executors.newCachedThreadPool()

void mainLoop() {
    while (true) {
        Socket s = serverSocket.accept();
        executor.execute(new TheHandler(s));
    }
}
static class TheHandler implements Runnable {
    Socket s;
    public TheHandler(Socket s) { this.s = s; }
    @Override public void run() {
        InputStream in = s.getInputStream();
        // process request, etc.
        s.close();
    }
}
  • Use NIO (rather complicated) to allow N threads to service M clients, where N is small and M is huge. This uses event-based programming. We can set all network operations to be non-blocking, and only wait as long as we want to for them. An extensive example can be found below.
  • Use a framework like Netty, Akka, etc. that wraps the NIO stuff up in a ribbon and a tie

10K feet above NIO

Asking for Advice

There are many circumstances in life during which one may feel the need to ask for advice. For example, when evaluating a significant decision, or when dealing with an emotionally stressful circumstance. Here, I will discuss advice about significant decisions.

In giving advice, everyone has a different approach. I generally try to follow a line I heard in a rap song by The Streets, “If you never tell a lie to her, you don’t have to remember anything.” In other words, lying will only complicate your life because you have to remember the lies you made up. (Caveat: this may not always the best way to go for emotionally complex issues.) I also enjoy helping people rationally and realistically evaluate their options for significant decisions, and surely if someone recalls that your input was helpful in the past, they will be more likely to ask you in the future.

Most people I know don’t seem to like giving useful advice. It seems they either are (1) too afraid that their honesty will lead you to dislike them, or (2) they feel so stressed with their own issues that taking on yours for a few minutes would be overwhelming, or (3) they find your problem uninteresting and simply have better things to do.

But some people are the opposite. They will patiently listen to your question and give what they feel to be an honest evaluation of where you stand and what you should do. The advice of people in this category will often be heavily and obviously biased by their own experience and ideology. This is simply a symptom of “being honest”.

So if you want good advice, it would be ideal to find someone who is honest, not stressed about a similar problem to yours, as well as interested in and knowledgeable of the subject; they should generally also be disinterested in your particular problem. However, this ideal candidate is not always available.

The Tone Makes the Point

Recently, hot-shot investor Paul Graham wrote a piece about economic inequality. This essay provoked a lot of dialogue online. I read the piece, and a few responses to it. I enjoyed the essay, and I found it very persuasive. In fact, I would say my ‘opinion on the matter’ has been changed by reading the essay and its accompanying summary by the author. The responses I read were also very interesting. They were from seemingly center-left-wing individuals like myself, only they were not persuaded by Graham’s arguments.

My point here is not so much to discuss the piece, as the tone of the piece. This piece was written with a tone that raises the hair on the back of the necks of liberals. This may have been unintentional on the part of Graham, …although considering the follow-up discussions of his previous piece on females in tech, may he just likes raising those particular hairs (they’re called “hackles”).

Gamified Learning

I have spent the last few days trying to get better at programming. I am no longer in school, so if I want to learn about, say, compilers, I can’t just sign up for a class on that. There are multiple ways to go about it. It depends on what you really want to learn and what you want to be able to do with that knowledge. I have tried various methods of learning over the past few days, which I have listed below.

What I have noticed of myself is that gamification works for me. I love the satisfaction of (in descending order of [perceived] satisfaction)

  • earning a badge
  • getting to the next level
  • beating other humans (directly, or via “percentile” calculation)
  • beating robots (similar to “beating a boss” in a video game)
  • getting points

First Glimpse of Compilers

I am close to two chapters into “Compilers” (a.k.a. “The Dragon Book”), by Aho, Lam, Sethi, and Ullman. It is an exciting topic to learn about.

The most fundamental thing I have learned so far is the overall pipeline for understanding the modular components forming the way a compiler is typically constructed. It goes

  1. Lexical analyzer
  2. Syntax analyzer
  3. Semantic analyzer
  4. Intermediate code generator
  5. Machine-independent code optimizer (optional)
  6. Code generator
  7. Machine-dependent code optimizer (optional)

The input to the compiler pipeline is a “character stream” of the program. However I don’t recall the book ever dealing with specifications of that stream, except that it supports a generic getchar() operation. I can understand that if the entire source consists of one file, you just read character-by-character from the file into the compiler. Maybe the language designer decides how multiple-file projects are to be read by the compiler, i.e. it is beyond the scope of this book; or maybe they’ll go more in-depth on this later-on.

The Lexical Analyzer

The character stream is read into first component in the compilation pipeline: the lexical analyzer, which maps the character stream into a “token” stream. It seems like a token knows its tag, and its value. The tag is basically the type of this token in the eyes of the compiler. Possible tags in their example include NUM, ID, [keyword]s, and I added COMMENT as part of an exercise. The value for a token might be the literal value of a literal; or it might be the name of a variable. More about that below.

Symbol Tables

In addition to creating the token stream, the lexical analyzer builds the symbol table. It seems to that the symbol table maps names to places in memory. A symbol table also ‘by default’ implements the language’s preferred form of block scoping. Upon entering a new scope, a new symbol table is created that points to its “parent”, the one it is nested inside of. If a variable is declared, it is added to the symbol table to this scope. Later, when a variable is referenced (i.e. init’d, read, or updated), we will consult the table for the current innermost scope. If the variable’s data is not found there, we will continue to check ancestors until we find it.

Learning vs Doing

The goal of my first project at my first “real” job is to get something useful done within a few days and start to feel like a contributing member of the team. However it has been about a week, and I still have not finished that project.

From my experience talking with some of my colleagues so far, their collective attitude might be summed up with, “Google it, then copy-paste it, but don’t worry about what it does.” In my life, I have worked with and met many people having that attitude. Partially because of my experience working with those people, it happens to decidedly not be my attitude. My attitude is more like “google it, learn what to do, learn why that is the right approach, take notes, and then copy-paste and modify the best solution to make the final solution as clean as possible.” This strategy got me through many tough situations, so I have built up faith in it.

So, after seeing me spend days learning about ssh tunnelling, ansible, and vagrant — and not finishing my simple project — they finally said something along the lines of

At this rate it will take you weeks to learn how to automate deployment of a virtual machine. Why don’t you just deploy one copy and then learn about how to automate it on your own time?

Now, “weeks” is probably an overstatement, but they pointed out to me that I had sort of assumed out of nowhere that I was hired as some sort of devops role for the company, even though what I’m actually interested in is what one might call “big data engineering”. They said, “If you find yourself repeating the same tasks over and over, then you should learn to automate them.” It is now obvious that they are in the right.

At the time, I was startled by the way they approached me about what I consider to be largely a difference in personalities. But I can see that they are concerned that someone who reads too much never gets anything done, and so far I have fit that stereotype. At the same time, I am concerned that a person who doesn’t read will do their work quickly and incorrectly and will then spend the next few weeks rejiggering a broken project, and so far they have fit that stereotype. In short, it seems that we are all prejudiced and have plenty to learn from each other.

Going forward I shall take their advice and do the more mundane tasks as fast as possible and only learn things that come up more than once. If that doesn’t work out for me because I’m just cranking out shitty work, I will revert back to understanding what I am doing.

Very Basics of Jetty

Jetty has cutting edge HTTP/2 support

For my Wireless Networking project, I’m going to compare HTTP/1.1 and HTTP/2 with respect to performance metrics when talking to a mobile phone over the cellular network. There are not so many implementations of HTTP/2 right now, and some of them seem a bit shaky. At the end of the day, it seems to me that the easiest way to run this sort of experiment in a reliable fashion is to use Java’s Jetty project. It has well-tested HTTP/1.1 support, many heavyweight framework users, implements HTTP/2’s server push and all sorts of HTTP/2 negotation mechanisms, and I like static types.

Jetty is hard to find tutorials for

So I need to learn the basics of Jetty; and there’s a lot to learn, and I haven’t found a stellar resource, so I’ve been reading the embedded examples, which are decent. I’m usng “embedded” Jetty because that means I can write type-safe Java rather than XML. Perhaps XML would be a good choice for a long-standing app, but I’m making a prototype and it’s easier to just write code.

Here’s a brief overview of what I’ve learned so far, which may come in handy for someone else wanting to understand the very basics of Jetty (version 9). This is not a detailed overview because I don’t know what I’m talking about. I’m just explaining the things I have learned from the tutorial linked above and some mucking around.

Intro to HTTP/2

The problem of HTTP/1.1

HTTP/1.1 is from an earlier Web

For the past several years, Google, Mozilla, Akamai, the IETF, the academic research community, and others have been engaged in efforts to reduce PLTs experienced by users of the WWW. One bottleneck in apparent need of an update is the now-“ancient” HTTP/1.1 (H1) protocol, published in 1999.

From the beginning, one of the major design goals of the original HTTP protocol was simplicity to implement and adopt, to encourage growth of the WWW. Evidently, this technique worked. However, over the past 16 years, the way people create, distribute, and view pages on the WWW has changed drastically. Pages today have far more Javascript, CSS, images, and other content to go along with the vanilla HTML. In the course of this evolution, numerous issues with H1 have come up, mostly pertaining to the number of sequential round-trips it requires to fully download the data for each web page. These round-trips introduce unnecessary latency.

Why the WiFi Sucks on the Porch

Why is WiFi so slow on the porch?

The problem at my house is that WiFi connectivity is generally fine inside the house, but while sitting on the porch it is nearly impossible for anyone to browse the Hinternets. I think a lesson of my Wireless Networking course explains this observation.

In Wireless networking we have the situation called the Hidden Terminal Problem, which is the following. Node A is too far from node C to hear its transmissions, but both are in range of node B which sits between A and C. Suppose C is currently transmitting data to B. Now A checks whether any transmissions are currently happening, and finds that there are none (because it is out of range of C). So A goes ahead and sends data to B. Now B can’t understand either data packet because they collided and interfered with each other in an unrecoverable way because they were both sent in the same channel.

I think the porch scenario is simply an example of the Hidden Terminal Problem above. In my room, my laptop can hear many of my neighbors’ WiFi LAN networks, but it’s the same set that my WiFi router sitting in the closet can hear. However outside, where there’s less cause for signal attenuation, my laptop can hear many more WiFi LAN networks than the router inside. So the router checks whether there is congestion, doesn’t hear any, and sends the data. But there is, in fact, congestion, and my computer doesn’t receive the data properly. The data is therefore not ACKd, the router times out on the ACK, and has to retransmit, and so on. This makes for a far slower Hinterconnectivity outside on the porch than in my room.

Maybe the lesson learnt is that the WiFi router should be situated in a place where it can hear more of the outside noise so that it can compensate better for that noise.

Bit Twiddling: Data Structure Alignment

In order to align a data structure at the nearest word-alignment to a given starting address, we’d must find the smallest multiple of N (a power of 2) which is greater than or equal to integer X. Is there a way we can make this fast, i.e. not use division or modulo?

First Thoughts

First of all

1
if (N > X) return N;

Otherwise, we have two cases

First Case

N: 00100
X: 01100
--------
=> 01100

This example indicates the following:

1
if (!(X & (N-1))) return X;

Second Case

N: 00100       
X: 01101       
--------       
=> 10000       

Which can be accomplished by

1
return (X | (N-1)) + 1;

So we have the first-take program of

1
2
3
if (N > X) return N;
if (!(X & (N-1))) return X;
return (X | (N-1)) + 1;