With Pith

Ethan Petuchowski

Avoiding Sign Extension

I was looking at an implementation of file-based mergesort from GitHub, and found the following snippet.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
/**
 * Author: cowtowncoder
 * https://github.com/cowtowncoder/java-merge-sort/blob/master/src/main/java/com/fasterxml/sort/std/ByteArrayComparator.java
 * 
 * Simple implementation of comparator for byte arrays which
 * will compare using <code>unsigned</code> byte values (meaning
 * that 0xFF is creator than 0x00, for example).
 */
public class ByteArrayComparator
    implements Comparator<byte[]>
{
    @Override
    public int compare(byte[] o1, byte[] o2)
    {
        final int len = Math.min(o1.length, o2.length);
        for (int i = 0; i < len; ++i) {
            // alas, sign extension means we must do masking...
            int diff = (o1[i] & 0xFF) - (o2[i] & 0xFF);
            if (diff != 0) {
                return diff;
            }
        }
        return o1.length - o2.length;
    }

}

What is the meaning of the remark “// alas, sign extension means we must do masking...”? What’s the deal with the masking?

Basics of Wireless Communication

I’ve been doing the readings for my Wireless Networking course at UTexas, and in the process have dug into much of the basics of radios and networks that I had ignored in the past. Here, I will try briefly describe what I have learned. Maybe not everything I will say here is exactly correct, but I think it’s at least mostly correct.

Let’s try to start somewhere near the beginning. Our goal is to transfer a information from one location LocSND to another LocRCV conveniently. The way we will accomplish that is by having LocSND manipulate the electromagnetic field around LocRCV. More specifically, we will encode a binary dataframe as modulations of a radio signal around a pre-determined carrier frequency.

How do we do that?

A Pattern in the Stone: Review and Summary

How I Found it

I was watching a biography of Richard Feynman, and they interviewed this guy W. Daniel Hillis, and he seemed like a cool dude so I looked him up on Google, and came across his book, The Pattern in the Stone: The Simple Ideas that Make Computers Work (1999). On Amazon it was compared to Code: The Hidden Language of Computer Hardware and Software, by Charles Petzold, an encredibly well- written book about how computers work. I would gladly read any book considered comparable in lucidity to Code, so I got The Pattern in the Stone.

What it says

The first few chapters are meant to give a basic understanding of what a computer is actually doing, and he spends some time noting the Universality of computers, which he says shows that “all computers are alike in what they can and cannot do”. Personally, my intuition of the workings of a computer comes mainly from an explanation by Richard Feynman himself (as seen on the YouTubes) which is “heuristic” rather than mechanical. Basically Feynman gradually turns a human being into a computer, and then talks about how this mechanistic person can be implemented using logic gates built from water pipes that he sketches on a whiteboard. In Pattern in the Stone”, he builds logic gates out of parallel and series wires, and also out of springs and pivots.

Then he introduces finite state machines and programming in LOGO. Then he mentions how machine code can be thought of as control instructions, specifying the next instruction to fetch and execute, and processing instructions, moving data to and from memory, and through the Arithmetic Logic Unit.

Then he starts really getting into what I think is the main point of the book, to convince the reader that there is no magical process occurring in our brains that a mechanical computer cannot replicate, meaning that

As far as we know, no device built in the physical universe can have any more computational power than a Turing machine…[so] a universal computer with the proper programming should be able to simulate the function of a human brain.

A Workflow and Scripts for Learning From Github

My “Learning” Workflow

As I wrote about before, I have developed an interesting method of learning from experts, which can be summarized as follows:

  1. Fork their repo on GitHub
  2. Clone the repo locally and “detach the HEAD” to the “inital commmit”
  3. Now repeat the following while (curious)
    1. Open the working tree in an editor/IDE
    2. If there’s something runnable, run it
    3. Understand everything going on in the working tree
      • Take hints from the commit message
    4. Advance HEAD one commit
    5. View the diff from the previous commit

How Apm Originally Worked

Over the past few days I have been learning how apm, the Atom Package Manager works under the hood. apm is what you use when, in GitHub’s (relatively new) “Atom” text editor, you go to the nice gui package installation interface under settings=>packages.

Atom is a “hackable” text editor built on top of Chromium, using Node.js and Coffeescript. I believe they call it hackable because all the code is open source, and you can add plugins to do whatever you want. Your plugins can even be written in C++ if that’s more your style.

My goal was to figure out how apm works, and I wasn’t sure how best to do that. My knowledge of Node.js was minimal, and I was no expert in Coffeescript. What I decided to do was fork the apm GitHub repo, clone it onto my computer, and set my local HEAD to the “initial commit”, and see if I could understand that. The complete contents are as follows

README.md
1
2
3
# APM - Atom Package Manager

Discover and install Atom packages.

At this stage I pretty confident that I understood everthing the author Kevin Sawicki was doing. Lucky for me, it seems Kevin is rather unique on GitHub’s Atom development team for having smaller commits. I can justify this by noting that on Atom-core’s list of developer contributions, he has 2x more commits than the next guy, but is not in the top 5 in terms of LOC added.

So with my head fully wrapped around the “initial commit” I moved my HEAD past the 2nd commit (a typo fix) into the first commit of any substance, “Add initial Gruntfile, binary, and ignores”. At this point there was some investigation to do.

Further Adventures in Collaboration

Getting to the same page

When a colleague and I are in a discussion about something and both start to get very excited about where it is going, I start to believe that we both must be seeing things in the same way. This is a fantasy. You start to feel that “connection”; it gets palpable. It feels like we’re really communicating consciousness to consciousness. It feels like thoughts are pipelining across thin air. Sometimes it’s true, but I’d posit that usually it is not. And this misunderstanding of the true level of agreement can later be a cause of grief. You feel like the other person abandoned your shared vision when they take the result of your conversation and make something different from what you intended.

What Is an Abstraction

After some further thought, it has become clear that my previous post about abstractions and what makes them good got it all wrong. There, I said roughly that an abstraction is something that allows you to simplify a hard mental task to make that task easier. This needs to be reevaluated. The earlier post claimed that a hammer is an improper abstraction for opening up a package. That just doesn’t sound right. A hammer is nothing more than a bad tool for opening up a package. A tool is something that makes a task easier. A tool and an abstraction are not equivalent.

My Software Architecture professor began the semester by giving us about 20 very similar definitions of what software architecture is, to give us a general sense of what people are talking about. I left the lecture equally uninformed about what software architecture is, but did eventually come to grips with it. With the Sisyphean nature of the task now understood, I would like to propose a new definition:

An abstraction is a way of conceptualizing something without having to think about everything that is actually going on. If we think in terms of the abstraction, we can arrive at the same conclusions as could have been derived using the “underlying truth of the matter” without as much mental effort. After communicating the chain of logic using the abstraction, the listener would also be able to derive the result by substituting the abstraction out for the “underlying truth”.

It is basically a mental shorthand.

What Makes a Good Abstraction

Creating good abstractions may have more value than anything else. The more I learn about creating a company and writing code, the more important I realize proper abstractions are.

Perhaps I should start by listing a few abstractions that I find to be “good” ones so that we’re more on the same page, because this an abstract topic, and everyone probably has a different conception of it. Surely I’ve missed some of the best ones, but these roll off my head:

  1. Programming languages (in that they are a higher-level representation of machine-code)
  2. Object-oriented programming
  3. Design patterns & architectural patterns
  4. The mouse (input device)
  5. A “process” as used in Unix
  6. Most mathematical notation

So what do these have in common? They give us a mental representation of what we’re doing that vastly simplifies what is actually going on. This means that in order to come up with something which is at its base level very complex, we only need to manipulate simplified mental objects. These means we must perform fewer total mental operations, using less mental short-term memory, to achieve the same result.

Lesson Learnt About Collaboration

Perhaps the greatest lesson of this summer has been in taking advantage of the power of collaboration.

When my boss decided to hire me, he told me that it was in large part because of my communication skills. I don’t know if he still agrees with this, but I like to think it is true.

One of my greatest prides is the ability to openly lose an argument. For a long time, this has quite often been my reason for entering an argument, and I try to make it easy to lose. If someone seems to know they’re right, we must together find the bridge of what I’m missing that will be convincing beyond a reasonable doubt of their correctness. Making it easy to lose means figuring out what you actually think, making that clear, and not wavering from that initial point of view even when more facts come to light. Or at least acknowledging that the original viewpoint was incorrect, and now this is what I [honestly] believe to be true. A regrettable human tendancy is to change one’s opinion during an argument as the facts come to light because “with these facts, my original point of view was wrong, and clearly I wasn’t wrong, so that couldn’t have been my real point of view.” This needs to be consciously avoided.

Inside Java’s BufferedReader

BufferedReader is suprisingly fast for parsing large text files. Why is that? In my experience, it is faster for this task than a BufferedInputStream. StackOverflow says this is because the BufferedReader uses char internally instead of byte.

What follows is a high-level breakdown of what’s going on “under the hood” of the BufferedReader, i.e. an overview of the implementation details. Details about mark support will be omitted.

For me, the most significant takeaways are the following

  1. There is no magic — every time I find this out about something I am surpised. This class is probably very similar to the way I would have naively written a buffering wrapper for a Reader object.
  2. Utmost efficiency is sacrificed for code clarity — my guess is that the reason they left in the inefficiencies I mentioned above is because having special cases would have clouded the code. If someone wants an even more efficient Reader they are always welcome to write their own.
  3. Small amount of code — there’s really not a whole lot to this class. It basically just reads into a buffer, then services incoming reads from that buffer. There are no “niceties” or asynchronous callbacks etc. I think the attitude of the auther is that if you want to find that, simply look elsewhere.
    • Of particular note: there is no a single mention of character encodings anywhere in the class.