CS 49J - Lab 2

Cover page image

Cay S. Horstmann

Lab Rules

Step 1. A Java REPL

  1. JDK 9 has jshell, a “read-eval-print-loop”, that you saw in the videos. If you have Java 9 installed, you execute it by running
    path to JDK/bin/jshell
    We'll do that eventually, but for today we'll use the online version. Type in
    and hit Enter. What do you get?
  2. What do you get for
    Why? (Hint: A is 10, B is 11, C is 12.)
  3. What do you get for
    Integer.valueOf("101", 2)

Step 2. A Hex Viewer

  1. Open Eclipse. Make a new Java project lab2. Make a new class HexViewer. Remember—be sure it is in the default package. Clear out the Package: field. Then paste this code.
    Also save this file to your Downloads directory. Depending on your browser and OS, you may need to right-click and select Save Link As or some such thing. Run the program. When the file dialog pops up, pick hello.txt from the Downloads directory. What happens?
  2. This file uses the UTF-8 encoding, which is probably the most common encoding for Unicode in files. In UTF-8, each Unicode code point is encoded as a sequence of 1 - 4 bytes. In this file, all code points use one byte. What is the UTF-8 encoding in hex for the letter H? The letters e and l? The space? The exclamation mark?
  3. What is the significance of the 0A at the end?

Step 3. More UTF-8 Encodings

  1. This file contains the message
    Hello, San José!
    In UTF-8, the é is encoded as two bytes. What are they? Use the HexViewer and the hello2.txt file to find out. Be careful to use this exact file. Or, if you make your own file, be sure to save in UTF-8. (While the vast majority of computing devices defaults to UTF-8 for file encoding, both Windows and Mac OS still cling to archaic coding system defaults.)
  2. Look up this webpage. Where do you find the UTF-8 encoding for é?
  3. This file contains the message
    Hello, Java™!
    What is the UTF-8 encoding of the symbol?
  4. This file contains the message
    Hello 😻
    What is the UTF-8 encoding of the 😻 symbol?

Step 4. Running from the Command Line

  1. A disadvantage of running a program from Eclipse is that you don't have an obvious place in the file system from where your program runs. If you need to navigate to another directory many times, that's not productive. This is where the command line comes in. Save the HexViewer source in the download directory.
  2. Open a terminal window. Change to the download directory (cd Downloads). Type
    ls hello*.txt
    or, on Windows,
    dir hello*.txt
    What do you get? Then type
    ls *.java
    or, on Windows,
    dir *.java
    What do you get?
  3. If you saw the various hello text files and HexViewer.java, then keep going. If not, ask for help.
  4. Compile HexViewer from the command line. How do you do that?
  5. Now run
    java HexViewer hello1.txt
    What happens?
  6. How do you inspect hello2.txt? (Hint: Hit the ↑ key)
  7. Is this better than using a file dialog? If so, why?

Step 5. Strings and UTF-16

  1. Add this class to your lab2 project. Compile and run, and type in the input Hello. What happens?
  2. What you are seeing are the UTF-16 encodings of the characters in the string. What is the UTF-16 code for H? For e? for l?
  3. How are they related to the UTF-8 encoding?

Step 6. More UTF-16

  1. We'd like to try the same thing with San José to figure out what happens with the é. But how is that going to work? Even if you know how to type an é on your computer, how do you know what character encoding is used in your console? To have complete control over this, we'll again read from a file. (As Vladimir Lenin said, “Trust is good. Control is better.”)
    Do what you did before. Download into your Downloads directory. Open a terminal window and use the cd command to change to that directory. Run
    javac StringPrinter.java
    java StringPrinter hello2.txt
    What happens? (If you get an error message, ask for help.)
  2. What is the UTF-16 encoding for é?
  3. What is the UTF-16 encoding for ™? How did you find out?
  4. Running the same kind of experiment, what is the UTF-16 encoding for the smiling cat with heart-shaped eyes? What's different about it?

Step 7 (Optional) 🍹

  1. Look at this page to find the Unicode code point for our smiling cat. What is it?
  2. Ok, so how do we get from there to "\uD83D\uDE3B"? This page tells how to do it: Write a Java program that carries out these steps and prints the two UTF-16 code units in hex.
  3. Now make your program work for any code point ≥ U+10000. Read in the code point as a string U+1.... and do your thing. What do you get for U+1F379?
  4. Even more extra time? Then tackle UTF-8.