Use Kafka Streams With Java Reactive Streams

https://danlebrero.com/2017/01/05/proof-of-concept-using-kafkastreams-and-ktables/

In the last article, we used kafka consumer and producer to read in and write out the kafka messages. In this article, we will show how to replace them with Kafka Streams so that we don’t have to handle back pressure ourselves. Back-pressure is an important feedback mechanism that allows systems to gracefully respond to load rather than collapse under it.

Kafka Streams is a new open source library, part of the Apache Kafka project, that promises to make stream processing simple without losing the power and scalability of other stream processing systems like Storm or Spark Streaming

The major selling points are:

  • Scalable, using the same partition-based model as Kafka.
  • Real time, optional micro-batching.
  • Stateful stream processing.
  • It is a library, not a framework.

Kafka Streams don’t need any new infrastructure, depending only on the Kafka cluster. It has a nice functional API similar to Java 8 reactive streams.

Configure Kafka Streams

To start Kafka streams in your Java project, you’d need to define the following properties:

Properties props = new Properties();
props.put(StreamsConfig.APPLICATION_ID_CONFIG, kafkaStreamConfig.getApplicationID());
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, kafkaStreamConfig.getBootstrapServer());
props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass());

StreamsBuilder builder = new StreamsBuilder();
KafkaStreams streams = new KafkaStreams(builder.build(), props);
streams.start();

You’d also need to specify the POJO deserializer forKafka stream when reading in the messages as dataPointSerde:

Map<String, Object> serdeProps = new HashMap<>();

final Serializer<DataPoint> dataPointSerializer = new JsonPOJOSerializer<>();
serdeProps.put("JsonPOJOClass", DataPoint.class);
dataPointSerializer.configure(serdeProps, false);

final Deserializer<DataPoint> dataPointDeserializer = new JsonPOJODeserializer<>();
serdeProps.put("JsonPOJOClass", DataPoint.class);
dataPointDeserializer.configure(serdeProps, false);

final Serde<DataPoint> dataPointSerde = Serdes.serdeFrom(dataPointSerializer, dataPointDeserializer);

Introducing Java Reactive Streams

http://www.reactive-streams.org/

Java Reactive Streams is useful for handling streams of data—especially “live” data whose volume is not predetermined—requires special care in an asynchronous system. The most prominent issue is that resource consumption needs to be controlled such that a fast data source does not overwhelm the stream destination. Asynchrony is needed in order to enable the parallel use of computing resources, on collaborating network hosts or multiple CPU cores within a single machine (see more on Reactive Manifesto). The scope of Reactive Streams is to find a minimal set of interfaces, methods and protocols that will describe the necessary operations and entities to achieve the goal—asynchronous streams of data with non-blocking back pressure.

First, to transform the input Kafka Stream message, we can the KStream interface ValueMapper and define the logic for transform input DataPoint to LabeledDataPoint in the apply method.

public class LabeledDataPointProducer implements ValueMapper<DataPoint, LabeledDataPoint> {

    private final ShotoClassifier classifier;
    private final DataPointCassandraDao cassandraDao;

    public LabeledDataPointProducer(ShotoClassifier classifier, DataPointCassandraDao cassandraDao) {
        this.classifier = classifier;
        this.cassandraDao = cassandraDao;
    }

    /**
     * Obtains a label for the datapoint provided either through cache or classification
     * @param dataPoint
     * @return a LabeledDataPoint
     */
    public LabeledDataPoint apply(DataPoint dataPoint) {
        LabeledDataPoint cached = cassandraDao.getCached(dataPoint.getMd5String());
        // if cached then set isCached to true and return the labeledDataPoint
        if (cached != null) {
            cached.setCached(true);
            return cached;
        }

        // if not cached then transform and save the labeledDataPoint to cassandra database
        LabeledScore<CategoryClassification> label = classifier.classify(dataPoint.getInputText()).getLabel();
        LabeledDataPoint ldp = LabeledDataPoint.fromClassification(dataPoint, label);
        cassandraDao.saveDataPoint(ldp, ldp.isHuman());
        return ldp;
    }
}

We can implement Consumer interface (consume input, return void) that takes the type of kafka message “LabeledDataPoint" and pass into the checking logic defined in the override accept method:

public class OutboundConsumer implements Consumer<LabeledDataPoint> {

    final Random random = new Random(1L);
    private double confidenceThreshold;
    private double highConfAuditPct;
    private String highConfTag;
    private String lowConfTag;
    private BiConsumer<LabeledDataPoint, String> writeToOutbound;

    public OutboundConsumer (double confidenceThreshold, double highConfAuditPct, String highConfTag, String lowConfTag, BiConsumer<LabeledDataPoint, String>writeToOutbound) {
        this.confidenceThreshold = confidenceThreshold;
        this.highConfAuditPct = highConfAuditPct;
        this.highConfTag = highConfTag;
        this.lowConfTag = lowConfTag;
        this.writeToOutbound =writeToOutbound;
    };

    @Override
    public void accept(LabeledDataPoint labeledDataPoint) {
        if (labeledDataPoint.isCached()) {
            return;
        } else if (labeledDataPoint.getLabelConfidence() < confidenceThreshold) {
            writeToOutBound.accept(labeledDataPoint, lowConfTag);
        } else if (random.nextFloat() < highConfAuditPct) {writeToOutBound.accept(labeledDataPoint, highConfTag);
        }
    }

    @Override
    public Consumer<LabeledDataPoint> andThen(Consumer<? super LabeledDataPoint> after) {
        return null;
    }
}

After preparing all the above, we can now pass in Java Reactive Streams to in Kafka Stream API to transform our input data as we intended with:

// read in and transform input DataPoint to LabeledDataPoint with ValueMapper
KStream<String, LabeledDataPoint> dataPointKStream = builder.stream(kafkaStreamConfig.getStreamInboundTopic(),
        Consumed.with(Serdes.String(), dataPointSerde)).mapValues(labeledDataPointProducer);

// check condition and write result to kafka streams
dataPointKStream.filter((k,v) -> v.isCached()).filter((k,v) -> v.getLabelConfidence() > 0.1)
        .to(kafkaStreamConfig.getStreamOutboundTopic());

// pass in Java consumer 
dataPointKStream.foreach((k,v) -> outboundConsumer.accept(v));

廣告

Spin Up Kafka Cluster With Kafka Connect

https://www.sohamkamani.com/blog/2017/11/22/how-to-install-and-run-kafka/

download version kafka_2.11-2.0.0 from here

Starting Zookeeper

Zookeeper is a key value store used to maintain server state. Kafka requires a Zookeeper server in order to run, so the first thing we need to do is start a Zookeeper instance.

Start the server by running:

bin/zookeeper-server-start.sh config/zookeeper.properties

Starting our brokers

Kafka brokers form the heart of the system, and act as the pipelines where our data is stored and distributed.

There are 3 properties, that have to be unique for each broker instance:

broker.id=0
listeners=PLAINTEXT://:9092
log.dirs=/tmp/kafka-logs

Since we will have 3 servers, it’s better to maintain 3 configuration files for each server. Copy the config/server.properties file and make 3 files for each server instance:

cp config/server.properties config/server.1.properties
cp config/server.properties config/server.2.properties
cp config/server.properties config/server.3.properties

Change the above 3 properties for each copy of the file so that they are all unique.

server.1.properties

broker.id=1
listeners=PLAINTEXT://:9093
log.dirs=/tmp/kafka-logs1

server.2.properties

broker.id=2
listeners=PLAINTEXT://:9094
log.dirs=/tmp/kafka-logs2

server.3.properties

broker.id=3
listeners=PLAINTEXT://:9095
log.dirs=/tmp/kafka-logs3

Also, create the log directories that we configured:

mkdir /tmp/kafka-logs1
mkdir /tmp/kafka-logs2
mkdir /tmp/kafka-logs3

Finally, we can start the broker instances. Run the below three commands on different terminal sessions: (start kafka process, different from kafka connect)

bin/kafka-server-start.sh config/server.1.properties
bin/kafka-server-start.sh config/server.2.properties
bin/kafka-server-start.sh config/server.3.properties

Creating a topic

Before we can start putting data into your cluster, we need to create a topic to which the data will belong with command:

bin/kafka-topics.sh –create –topic my-kafka-topic –zookeeper localhost:2181 –partitions 3 –replication-factor 2

The paritions options lets you decide how many brokers you want your data to be split between. Since we set up 3 brokers, we can set this option to 3.

The replication-factor describes how many copies of you data you want (in case one of the brokers goes down, you still have your data on the others).

The producer instance

The “producer” is the process that puts data into our Kafka cluster. The command line tools in the bin directory provide us with a console producer, that inputs data into the cluster every time your enter text into the console.

To start the console producer, run the command:

bin/kafka-console-producer.sh –broker-list localhost:9093,localhost:9094,localhost:9095 –topic my-kafka-topic

The broker-list option points the producer to the addresses of the brokers that we just provisioned, and the topic option specifies the topic you want the data to come under.

You should now see a command prompt, in which you can enter a bunch of text which gets inserted into the Kafka cluster you just created every time you hit enter.

Consumers

The only thing left to do is read data from the cluster with the command:

bin/kafka-console-consumer.sh –bootstrap-server localhost:9093 –topic my-kafka-topic –from-beginning

The bootstrap-server can be any one of the brokers in the cluster, and the topic should be the same as the topic under which you producers inserted data into the cluster.

The from-beginning option tells the cluster that you want all the messages that it currently has with it, even messages that we put into it previously.

When you run the above command, you should immediately see all the messages that you input using the producer, logged onto your console.

Kafka Connect

Kafka Connect, an open source component of Apache Kafka, is a framework for connecting Kafka with external systems such as databases, key-value stores, search indexes, and file systems.

Using Kafka Connect you can use existing connector implementations for common data sources and sinks to move data into and out of Kafka.

Source Connector:
A source connector ingests entire databases and streams table updates to Kafka topics. It can also collect metrics from all of your application servers into Kafka topics, making the data available for stream processing with low latency.
Sink Connector:
A sink connector delivers data from Kafka topics into secondary indexes such as Elasticsearch or batch systems such as Hadoop for offline analysis.

You’d need to run Kafka connect in a separate thread with command:

./bin/connect-distributed.sh config/connect-distributed.properties

Then create a kafka topic with REST API:

$ curl -X POST -H “Content-Type: application/json" -d “$(cat test_sourceConnector.json)" localhost:8083/connectors

A sample connector config would look like this:

{
“name": “my-test-source",
“config": {
“connector.class": “io.confluent.connect.jdbc.JdbcSourceConnector",
“tasks.max": “1″,
“connection.url": “jdbc:sqlserver://{connect url};database={myDB}",
“connection.user": “username",
“connection.password": “password",
“mode": “incrementing",
“incrementing.column.name": “id",
“key.converter": “org.apache.kafka.connect.json.JsonConverter",
“topic.prefix": “my-test-jdbc-“,
“name": “my-test-source",
“table.whitelist": “myTable" // target table to retrieve data from
}
}

Principals Life and Work by Ray Dalio

The book <Principals Life and Work> is wrote by the founder of Bridgewater, Ray Dalio. It describes how Ray set up his fundamental life principles and how those got applied to management. I benefit from his perspective as I try to set my own goals for the next quarter both for work and more about life. The following is some quotes from the book that I would like to share:

While most others seem to believe that learning what we are taught is the path to success, I believe that figuring out for yourself what you want and how to get it is a better path.

While most others seem to believe that having answers is better than having questions, I believe that having questions is better than having answers because it leads to more learning.

While most others seem to believe that mistakes are bad things, I believe mistakes are good things because I believe that most learning comes via making mistakes and reflecting on them.

While most others seem to believe that finding out about one’s weaknesses is a bad thing, I believe that it is a good thing because it is the first step toward finding out what to do about them and not letting them stand in your way.

While most others seem to believe that pain is bad, I believe that pain is required to become stronger.

I believe that our society’s “mistakephobia” is crippling, a problem that begins in most elementary schools, where we learn to learn what we are taught rather than to form our own goals and to figure out how to achieve them. We are fed with facts and tested and those who make the fewest mistakes are considered to be the smart ones, so we learn that it is embarrassing to not know and to make mistakes. Our education system spends virtually no time on how to learn from mistakes, yet this is critical to real learning. As a result, school typically doesn’t prepare young people for real life—unless their lives are spent following instructions and pleasing others. In my opinion, that’s why so many students who succeed in school fail in life.

What I wanted was to have an interesting, diverse life filled with lots of learning—and especially meaningful work and meaningful relationships. I feel that I have gotten these in abundance and I am happy by 1) working for what I wanted, not for what others wanted me to do; 2) coming up with the best independent opinions I could muster to move toward my goals; 3) stress-testing my opinions by having the smartest people I could find challenge them so I could find out where I was wrong; 4) being wary about overconfidence, and good at not knowing; and 5) wrestling with reality, experiencing the results of my decisions, and reflecting on what I did to produce them so that I could improve. I believe that by following this approach I moved faster to my goals by learning a lot more than if I hadn’t followed it.

I have become someone who believes that we need to deeply understand, accept, and work with reality in order to get what we want out of life. Whether it is knowing how people really think and behave when dealing with them, or how things really work on a material level—so that if we do X then Y will happen—understanding reality gives us the power to get what we want out of life, or at least to dramatically improve our odds of success. In other words, I have become a “hyperrealist.”

Enjoying your job, a craft, or your favorite sport comes from the innate satisfaction of getting better. the sequence of 1) seeking new things (goals); 2) working and learning in the process of pursuing these goals; 3) obtaining these goals; and 4) then doing this over and over again is the personal evolutionary process that fulfills most of us and moves society forward.

<Making Choices>

Life consists of an enormous number of choices that come at us and that each decision we make has consequences, so the quality of our lives depends on the quality of the decisions we make.

The way we make our dreams into reality is by constantly engaging with reality in pursuit of our dreams and by using these encounters to learn more about reality itself and how to interact with it in order to get what we want.

For most people success is evolving as effectively as possible, i.e., learning about oneself and one’s environment and then changing to improve. Personal evolution is both the greatest accomplishment and the greatest reward.

There are five big types of choices that we continually must make that radically affect the quality of our lives and the rates at which we move toward what we want. People who choose what they really want, and avoid the temptations and get over the pains that drive them away from what they really want, are much more likely to have successful lives.

Those who react well to pain that stands in the way of getting to their goals—those who understand what is causing it and how to deal with it so that it can be disposed of as a barrier—gain strength and satisfaction. This is because most learning comes from making mistakes, reflecting on the causes of the mistakes, and learning what to do differently in the future.

As you design and implement your plan to achieve your goals, you may find it helpful to consider that:

• Life is like a game where you seek to overcome the obstacles that stand in the way of achieving your goals.

• You get better at this game through practice.

• The game consists of a series of choices that have consequences.

• You can’t stop the problems and choices from coming at you, so it’s better to learn how to deal with them.

• You have the freedom to make whatever choices you want, though it’s best to be mindful of their consequences

• The pain of problems is a call to find solutions rather than a reason for unhappiness and inaction, so it’s silly, pointless, and harmful to be upset at the problems and choices that come at you (though it’s understandable).

• We all evolve at different paces, and it’s up to you to decide the pace at which you want to evolve.

• The process goes better if you are as accurate as possible in all respects, including assessing your strengths and weaknesses and adapting to them.

 

The best advice I can give you is to ask yourself what do you want, then ask ‘what is true’—and then ask yourself ‘what should be done about it.’ I believe that if you do this you will move much faster toward what you want to get out of life than if you don’t!

How We Use terraform + ansible To Spin Up Cassandra in OCI

For the past 4 weeks, we have been working on setting up a cassandra cluster in OCI using the technology terraform and ansible. First a very quick introduction to Terraform: terraform is a tool for defining and managing infrastructure as code. it allows us to write code to define how we want to deploy servers, databases, load balancers, and networks.

Ansible on the other hand, helps us automate configuration management by connecting to those instances after they are booted by terraform. Our team specifically is tasked to use terraform and ansible to provision a Cassandra database, but the general concepts could apply to other applications as well.

The way we run terraform and ansible now is to execute their binaries and code from our laptops to orchestrate a remote cloud.

There are the Local Prerequisites you’d need to setup before running terraform and ansible on your local machine, including install Terraform, Terraform OCI Provider (terraform is very heavily dependent on each cloud provider, so nearly all of our code is specific to OCI).  You’d also need to setup your OCI account with your ssh pub keys.

Before we dive into the code structure, I’d like to talk about some terraform commands we use frequently. You’d need to go into a terraform project folder to perform the commands. The first command is:

    • terraform init:
      • this command is used to download any plugins you need to provision your instances. Here we can see the oci provider version 2.1 is in place
    • terraform plan:
      • which shows you what terraform is going to do before it actually does it. it’s very much like a diff output of the state compare to the real infrastructure
    • terraform destroy
      • this command destroys all the nodes existing on the current infrastructure
    • terraform apply:
      • with this command, terraform will read the code and know how we want to configure the instances, and it will make to appropriate API calls to the cloud provider to create those resources for you

File structure:

    • terraform.tfvars
      • In the terraform cassandra project, we keep a terraform.tfvars file
      • This is where we keep all the input variables. You’d need to modify this before using terraform to provision your own instances
        • first you’d have to replace the tenancy, compartment, and image ID to your team specific ones
        • you’d also have to setup oci config to your user specific so terraform can make API calls on your behalf to OCI
        • you can also add instance specific params used to configure your infrastructure, such as instance shape and instance names
        • all these input variables will be accessed from the variables.tf file 
    • variables.tf
      • here we defines all the required variables that terraform needs in order to provision our infrastructure
      • some are from the tfvars file we’ve just talked about and some are hard coded as default value, such as tenant domains and image id
    • datasource.tf
      • in addition to input variables and hardcoded default values, we can also fetch information defined outside of our Terraform configuration with datasource.tf here
      • here we use it to fetch the availability domains under our compartment from OCI
      • with all these variables defined, we can now use instance.tf here to provision our instances

 

 

  • instance.tf is where OCI instances are actually created
    • The resource block here creates a resource of the given TYPE which is oci_core_instance with the NAME we defined, node for example. The combination of the type and name must be unique.
    • count: within the block, terraform knows how many instances we’d like create defined as count
    • availability_domain: within PHX, we have 3 availability domains for redundancy and we use Round Robin to match availability_domain to each node, so the first node created will be in Availability Domain one, the second will be inAvailability Domain two and so on
    • we are provided with one subnet for each availability domain
    • VNIC: is the virtual network card, and we need to tell it which network to connect, which is the subnet id
    • source_details: defines the image of the instance
    • the ssh_authorized_keys defines a list of ssh pub keys that are allow to access this instance
    • the provisioner down here: is used to execute commands once the instance is booted. Here we use it here to execute the ansible command after each instance is created

After terraform triggers ansible, ansible will install and configure cassandra on the instances we just created

    • vars &.facts: in the ansible project, let’s look at where we get the variables first,
      • under vars folder, main.yml is where we define the variables used for configuring cassandra, such as cluster name and config directory
      • nodes.yml contains the cassandra_seeds, we separate this out because it’s easier to update just this file when we need to modify the cassandra seeds
      • Another source of variables is Facts files. They are the auto-generated metadata on each instance that ansible is to run on and the data can be accessed by ansible playbooks
    • playbook is the core of ansible, which contains plain text YAML files that describes the desired state of the instances
      • the hosts keyword specifies which inventory group Ansible should apply this play to. In our case we want ansible to access all hosts we provided.
      • we also want to escalate privilege to root user on the instance to perform tasks like rpm package install
      • the play also need to access the define variables from these var files
      • here we have a cassandra.yml play for setting up cassandra
      • we also define the role we want to execute for this play

 

 

  • roles: are a special kind of playbook that are fully self-contained with tasks, templates and other supporting files. You can specify multiple roles that serves different purposes. so far we only have one role which is cassandra. Inside the cassandra folder, we define tasks we want ansible to perform for us
    • tasks:
      • in main.yml file: we define the task files we want this role to perform in order
      • within each task file, we define the individual tasks. Each task is given a name, followed by a module, which is reusable piece of ansible code that tells Ansible how to configure the instance
      • install: the first task is install Java and Cassandra using the yum module
      • template:
        • then we deploy configuration files using the template module. any variables inside the template will be replaced with the variables that we defined in the vars file or from the facts file
        • for example the cassandra.yaml file in the template folder, we need to provide the cluster name which ansible reads from the vars file, so the instances we created will be within the same cassandra cluster, cassandra seeds which is read from node.yml file, and broadcast_rpc_address: which is set to {{ ansible_nodename }} which is read from the facts file
      • firewall: we also use firewalld module to set the firewall rules for ports Cassandra used to communicate with other instances within the same cluster
    • handlers: After the tasks are completed, it will trigger the handlers, here we want to restart cassandra after all the installation and file configuration is completed
    • the files folder: contains other supporting files which can be accessed by this role.

After the configuration is complete, we get status of each task. if that succeeds, we can ssh into the cassandra cluster we just built and use nodetool to check the status of the cluster.

Clojure for The Brave and True Part 3

In this chapter, you’ll begin integrating your experience into a new functional programming mindset. The core concepts you’ll learn include: what pure functions are and why they’re useful; how to work with immutable data structures and why they’re superior to their mutable cousins; how disentangling data and functions gives you more power and flexibility; and why it’s powerful to program to a small set of data abstractions.

Recursion Instead of for/while

Problem

Clojure has no assignment operator. You can’t associate a new value with a name without creating a new scope:

(def great-baby-name "Rosanthony")
great-baby-name
; => "Rosanthony"

(let [great-baby-name "Bloodthunder"]
  great-baby-name) ;; define again in the local scope
; => "Bloodthunder"

great-baby-name
; => "Rosanthony" ;; still global variable here

Solution

Clojure lets you work around this apparent limitation with recursion. The following example shows the general approach to recursive problem solving:

(defn sum ;; multi-arity functions, either 1 or 2 param
 ([vals] (sum vals 0))  ;; start with 0 if only 1 param, call sum recursively
  ([vals accumulating-total] ;; recur when split vals into 2 params
      (if (empty? vals)  
       accumulating-total ;; return the total if no more vals left
       (sum (rest vals) (+ (first vals) accumulating-total))))) ;; recursion

Each recursive call to sum creates a new scope where vals and accumulating-total are bound to different values, all without needing to alter the values originally passed to the function or perform any internal mutation.

Note that you should generally use recur when doing recursion for performance reasons. The reason is that Clojure doesn’t provide tail call optimization.

(defn sum
  ([vals]
     (sum vals 0))
  ([vals accumulating-total]
     (if (empty? vals)
       accumulating-total
       (recur (rest vals) (+ (first vals) accumulating-total))))) ;; use recur instead of calling sum

Another way you might be used to using mutation is to build up the final state of an object.

(require '[clojure.string :as s])
(defn clean
  [text]
  (s/replace (s/trim text) #"lol" "LOL"))

(clean "My boa constrictor is so sassy lol!  ")
; => "My boa constrictor is so sassy LOL!"

But with Clojure, Instead of progressively mutating an object, the clean function works by passing an immutable value, text, to a pure function, which returns another immutable value.

Combining functions like this—so that the return value of one function is passed as an argument to another—is called function composition. In fact, this isn’t so different from the previous example, which used recursion, because recursion continually passes the result of a function to another function; it just happens to be the same function. In general, functional programming encourages you to build more complex functions by combining simpler functions.

Cool Things to Do with Pure Functions

comp

Comp is used for creating a new function from the composition of any number of functions.

((comp inc *) 2 3)
; => 7

The function multiplies the numbers 2 and 3 and then increments the result. Using math notation, you’d say that, in general, using comp on the functions f1f2, … fn, creates a new function g such that g(x1x2, … xn) equals f1f2fn(x1x2, … xn))). One detail to note here is that the first function applied—* in the code shown here—can take any number of arguments, whereas the remaining functions must be able to take only one argument.

Here’s an example that shows how you could use comp to retrieve character attributes in role-playing games:

(def character
  {:name "Smooches McCutes"
   :attributes {:intelligence 10
                :strength 4
                :dexterity 5}})
(def c-int (comp :intelligence :attributes)) ;; first get attributes then get intelligence 
(def c-str (comp :strength :attributes))
(def c-dex (comp :dexterity :attributes))

(c-int character)
; => 10

(c-str character)
; => 4

(c-dex character)
; => 5

If one of the functions you want to compose needs to take more than one argument, you can wrap it in an anonymous function.

(defn spell-slots
  [char]
  (int (inc (/ (c-int char) 2))))

(spell-slots character)
; => 6

Replace^ with:

(def spell-slots-comp (comp int inc #(/ % 2) c-int)) ;; wrap the division in an anony­mous function

Clojure’s comp function can compose any number of functions. To get a hint of how it does this, here’s an implementation that composes just two functions:

(defn two-comp
  [f g]
  (fn [& args]
    (f (apply g args)))) ;; use apply on function g with all the remaining args so function g can take any number of args, whereas function f can only take one arg which is the result of function g

memoize

Another cool thing you can do with pure functions is memoize them so that Clojure remembers the result of a particular function call. You can do this because pure functions are referentially transparent.

Memoization lets you take advantage of referential transparency by storing the arguments passed to a function and the return value of the function. That way, subsequent calls to the function with the same arguments can return the result immediately. This is especially useful for functions that take a lot of time to run.

(def memo-sleepy-identity (memoize sleepy-identity))
(memo-sleepy-identity "Mr. Fantastico")
; => "Mr. Fantastico" after 1 second

(memo-sleepy-identity "Mr. Fantastico")
; => "Mr. Fantastico" immediately

Exercises

  1. You used (comp :intelligence :attributes) to create a function that returns a character’s intelligence. Create a new function, attr, that you can call like (attr :intelligence) and that does the same thing.
  2. Implement the comp function.
    • (defn my-comp [& fns]
        (fn [& args]
          (reduce (fn [result-so-far next-fn] (next-fn result-so-far)) 
            (apply (last fns) args) (rest (reverse fns)))))
    • apply the rightmost parameter to the args, remaining parameters are reversed to be fed to the reduce function, which iteratively applies the functions to the result
  3. Implement the assoc-in function. Hint: use the assoc function and define its parameters as [m [k & ks] v].
    • Associates a value in a nested associative structure, where ks is a
      sequence of keys and v is the new value and returns a new nested structure.
      If any levels do not exist, hash-maps will be created.
  4. Look up and use the update-in function.
    • 'Updates' a value in a nested associative structure, where ks is a
      sequence of keys and f is a function that will take the old value
      and any supplied args and return the new value, and returns a new
      nested structure.
  5. Implement update-in.

The Passionate Programmer: Creating a Remarkable Career in Software Development

This book is highly recommended if you want to advance your career as a software engineer. The authors shares his personal experience as a software engineer and provides actionable items that I found very useful.

Below is a list of actionable items I found particularly useful:

General Items

Be the worst – learn from people around you

Be a Generalist Be a Specialist .

Dive into directory services, get comfortable with a UNIX variant, and master a scripting language.

Specific Items

Learn From Projects

Pick a project, and read it like a book. Make notes. Outline the good and the bad. Write a critique, and publish it. Find at least one trick or pattern that you can use from it. Find at least one bad thing that you observed that you will add to your “What not to do” checklist when you’re developing software. 2. Find a group of like-minded people, and meet once a month. Each session have someone nominate some code to study—2 lines to 200 lines. Break it down. Discuss what’s behind it. Think of the decisions that went into it. Ponder the code that isn’t there.

If you treat your projects like a race, you’ll get to the end a lot faster than if you treat them like a prison cell. Create movement. Be the one who pushes. Don’t get comfortable. Always be the one to ask, “But what can we do right now?”

Setting Current Job Goals

Schedule a meeting with your manager. The agenda is for you to understand your manager’s goals for the team over the coming month, quarter, and year. Ask how you can make a difference. After the meeting, examine how your daily work aligns to the goals of your team. Let them be a filter for everything you do. Prioritize your work based on these goals.

Put your career goals away for a week. Write down your goals for your current job. Instead of thinking about where you want to go next, think about what you want to have accomplished when you finish the job you’re in. What can you have produced in this job that will be great? Create a plan that is both strategic and tactical. Spend the week implementing these tactics in support of the long-term goal of “finishing” this job. During lunches and breaks with your co-workers, focus the conversation on these goals. Steer yourself and those around you away from any conversation about career advancement or office politics and gossip. At the end of the week, take stock of your progress toward meeting these job goals. How long will it be before you’ve accomplished everything you feel you need to in your current role? How will you know you’re done?

Keeping A Work Log

Inventory the code you have written or you maintain and all the tasks you perform. Make a note of anything for which the team is completely dependent on you. Maybe you’re the only one who fully understands your application’s deployment process. Or there is a section of code you’ve written that is especially difficult for the rest of the team to understand. Each of these items goes on your to-do list. Document, automate, or refactor each piece of code or task so that it could be easily understood by anyone on your team. Do this until you’ve depleted your original list. Proactively share these documents with your team and your leader. Make sure the documents are stored somewhere so that they will remain easily accessible to the team. Repeat this exercise periodically

Measure, improve, measure—For the most critical application or code that you maintain, make a list of measurable factors that represent the quality of the application. This might be response time for the application, number of unhandled exceptions that get thrown during processing, or application uptime. Or, if you handle support directly, don’t directly assess quality for the application. Support request turnaround time (how fast you respond to and solve problems) is an important part of your users’ experience with the application. Pick the most important of these measurable attributes, and start measuring it. After you have a good baseline measurement, set a realistic goal, and improve the application’s (or your own) performance to meet that goal. After you’ve made an improvement, measure again to verify that you really made the improvement you wanted. If you have, share it with your team and your customers.

Dealing with Mistakes

Take the blame. Don’t try to look for a scapegoat even if you can find a good one. Even if you’re not wholly to blame, take responsibility and move on. The goal is to move past this point as quickly as possible. A problem needs a resolution. Lingering on whose fault it is only prolongs the issue. • Offer a solution. If you don’t have one, offer a plan of attack for finding a solution. Speak in terms of concrete, predictable time frames. If you’ve designed your team into a corner, give time frames on when you will get back with an assessment of the effort required to reverse the issue. Concrete, attainable goals, even if small and immaterial, are important at this stage. Not only do they keep things moving from bad to good, but they help to rebuild credibility in the process.

Start keeping a development diary. Write a little in it each day, explaining what you’ve been working on, justifying your design decisions, and vetting tough technical or professional decisions. Even though you are the primary (or only—it’s up to you) audience, pay attention to the quality of your writing and to your ability to clearly express yourself. Occasionally reread old entries, and critique them. Adjust your new entries based on what you liked and disliked about the old ones. Not only will your writing improve, but you can also use this diary as a way to strengthen your understanding of the decisions you make and as a place to refer to when you need to understand how or why you did something previously.’

Catalog the crusades you’ve personally witnessed in the workplace. Think of the people you’ve worked with who have behaved as if on a mission. Think of the most driven and effective people in the places where you’ve worked. What were their missions? Can you think of any such missions that were inappropriate? Where is the line between drive and zealotry? Have you seen people cross it?

Companies want to hire experts. While a résumé with a solid list of projects is a good way to demonstrate experience, nothing is better at a job interview than for the interviewer to have already heard of you. It’s especially great if they’ve heard of you because they’ve read your articles or books or they’ve seen you speak at a conference. Wouldn’t you want to hire the person who “wrote the book” on the technology or methodology you’re attempting to deploy?

First, read weblogs. Learn about weblog syndication, and get yourself set up with an aggregator. If you don’t know what to read, think of a few of your favorite technical book authors and search the Web. You will probably find that they have a weblog. Subscribe to their feed and to the feeds of the people they link to. Over time, your list of feeds will grow as you read and find links to the weblogs other people have been writing. Now open your own weblog. Many free services are available for hosting and driving a weblog. It’s dead simple to do. Start by writing about (and linking to) the stories in your aggregator that you find interesting. As you write and link, you’ll discover that the weblog universe is itself a social network—a microcosm of the career network you are starting to build. Your thoughts will eventually show up in the feed aggregators of others like you, who will write about and spread the ideas you’ve created.

Your writings on the Web will also provide work examples that you can use in the next step. Now that you’re writing in your own forum, you might as well take your writing to community websites, magazines, or even books. With a portfolio of your writing ability available on the Web, you’ll have plenty of example material to include in an article or book proposal. Get yourself in print, and your network grows. More writing leads to more writing opportunities. And all of these lead to the opportunity to speak at conferences

Save the file but leave it open. If you reboot, reopen the file. You have three weeks. Each day, choose an item from the list and write an article. Don’t think too hard about it. Just write it and publish it. In the articles, link to other weblogs with related articles. As you read the list to pick each day’s topic, feel free to add ideas to it. After the three weeks are over, pick your two favorite articles and submit them to user-moderated news sites such as Digg and Reddit. If you still have ideas on your list, keep writing.

Although many of these developers are just having fun and expressing themselves, some real incentives exist there. They are moving their way up the social chain of a community. They are building a name for themselves. They are building a reputation in the industry. They may not be doing it on purpose, but they are marketing themselves in the process.

Contribute to Open Source Projects

Stuart Halloway15 does a workshop at conferences he calls “Refactotum.” If you get a chance to participate, I highly recommend it, but the gist is as follows: Take a piece of open source software with unit tests. Run the unit tests through a code coverage analyzer. Find the least-tested part of the system and write tests to improve the coverage of that code. Untested code is often untestable code. Refactor to make the code more testable. Submit your change as a patch. The beautiful thing about this is it’s measurable and can be done quickly. There’s no excuse not to try it.

Releasing successful open source software, writing books and articles, and speaking at conferences may all increase your remarkability

One way to experiment is to shoot for remarkable productivity. Project schedules often have a lot of padding. Find something that everyone thinks is going to take a week and do it in a day. Work extra hours for it if you need to do so. You don’t have to make a habit of working extra hours, but this is an experiment. Do the work in a remarkably short time. See whether people “remark.” If not, why not? If so, in what ways? Fine-tune the variables, and try again.

The most serious barrier between us mortals and the people we admire is our own fear. Associating with smart, well-connected people who can teach you things or help find you work is possibly the best way to improve yourself, but a lot of us are afraid to try.

Of course, you don’t want to just randomly start babbling at these people. You’ll obviously want to seek out the ones with which you have something in common. Perhaps you read an article that someone wrote that was influential. You could show them work you’ve done as a result and get their input. Or, maybe you’ve created a software interface to a system that someone created. That’s a great and legitimate way to make the connection with someone.

The same thing can happen to you in your career. The process in this book is a loop that repeats until you retire. Research, invest, execute, market, repeat. Spending too much time inside any iteration of the loop puts you at risk of becoming suddenly obsolete.

Computing power doubles. With technology progressing so quickly, there is too much happening for any given person to keep up. Even if your skills are completely current, if you’re not almost through the process of learning the Next Big Thing, it’s almost too late. You can be ahead of the curve on the current wave and behind on the next. Timing becomes very important in an environment like this. You have to start by realizing that even if you’re on the bleeding edge of today’s wave, you’re already probably behind on the next one. Timing being everything, start thinking ahead with your study. What will be possible in two years that isn’t possible now? What if disk space were so cheap it was practically free? What if processors were two times faster? What would we not have to worry about optimizing for? How might these advances change what’s going to hit?

If you’re a programmer, try a day or two of doing your job as if you were a tester or a project manager. What are the many roles that you might play from day to day that you have never explicitly considered? Make a list, and try them on for size. Spend a day on each.

Before mapping out where you want to go, it can be encouraging and informative to map out where you’ve been. Take some time to explicitly lay out the timeline of your career. Show where you started and what your skills and jobs were at each step. Notice where you made incremental improvements and where you seemed to make big leaps. Notice the average length of time it took to make a major advancement. Use this map as input as you look forward in your career. You can set more realistic goals for yourself if you have a clear image of your historical progress. Once you’ve created this historical map, keep it updated. It’s a great way to reflect on your progress as you move toward your newly defined goals.

Set big goals, but make constant corrections along the way. Learn from the experience, and change the goals as you go. Ultimately, a happy customer is what we all want (especially when, as we plan our careers, we are our own customers)—not a completed requirement.

Clojure for The Brave and True Part 2

In this chapter, we dive into the more depth in terms of the Clojure core functions.

Core Functions in Depth

Clojure defines map and reduce functions in terms of the sequence abstraction, not in terms of specific data structures.

map

map doesn’t care about how lists, vectors, sets, and maps are implemented. It only cares about whether it can perform sequence operations on them.

Lists, vectors, sets, and maps all implement the sequence abstraction, so they all work with map.

(defn titleize
  [topic]
  (str topic " for the Brave and True"))

(map titleize ["Hamsters" "Ragnarok"])
; => ("Hamsters for the Brave and True" "Ragnarok for the Brave and True")

(map titleize '("Empathy" "Decorating"))
; => ("Empathy for the Brave and True" "Decorating for the Brave and True")

(map titleize #{"Elbows" "Soap Carving"})
; => ("Elbows for the Brave and True" "Soap Carving for the Brave and True")

(map #(titleize (second %)) {:uncomfortable-thing "Winking"})
; => ("Winking for the Brave and True")

The first two examples show that map works identically with vectors and lists. The third example shows that map can work with unsorted sets. In the fourth example, you must call second on the anonymous function’s argument before title-izing it because the argument is a map. I’ll explain why soon, but first let’s look at the three functions that define the sequence abstraction.

The point is to appreciate the distinction between the seq abstraction in Clojure and the concrete implementation of a linked list. It doesn’t matter how a particular data structure is implemented: when it comes to using seq functions on a data structure, all Clojure asks is “can I firstrest, and consit?” If the answer is yes, you can use the seq library with that data structure.

You can perform three core functions on a linked list: firstrest, and consfirst returns the value for the requested node, rest returns the remaining values after the requested node, and consadds a new node with the given value to the beginning of the list. After those are implemented, you can implement mapreducefilter, and other seq functions on top of them.

Polymorphism is one way that Clojure provides indirection. Polymorphic functions dispatch to different function bodies based on the type of the argument supplied. (It’s not so different from how multiple-arity functions dispatch to different function bodies based on the number of arguments you provide.)

Note Clojure has two constructs for defining polymorphic dispatch: the host platform’s interface construct and platform-independent protocols

When it comes to sequences, Clojure also creates indirection by doing a kind of lightweight type conversion, producing a data structure that works with an abstraction’s functions. Whenever Clojure expects a sequence—for example, when you call mapfirstrest, or cons—it calls the seqfunction on the data structure in question to obtain a data structure that allows for firstrest, and cons:

(seq '(1 2 3))
; => (1 2 3)

(seq [1 2 3])
; => (1 2 3)

(seq #{1 2 3})
; => (1 2 3)

seq always returns a value that looks and behaves like a list; you’d call this value a sequence or seq. Second, the seq of a map consists of two-element key-value vectors. That’s why map treats your maps like lists of vectors!

reduce

The first use is to transform a map’s values, producing a new map with the same keys but with updated values:

(reduce (fn [new-map [key val]]
          (assoc new-map key (inc val))) ;; assoc takes a map, key, and a value
        {} ;; -> starts with an empty map
        {:max 30 :min 10}) ;; a sequence of vectors, like ([:max 30] [:min 10]) 

*note: assoc is used to returns a new map of the
same (hashed/sorted) type, that contains the mapping of key(s) to
val(s) (docs: https://clojuredocs.org/clojure.core/assoc)

*Exercise: try implementing map using reduce, and then do the same for filter and some after you read about them later in this chapter.

filter and some

Use filter to return all elements of a sequence that test true for a predicate function.

(filter #(< (:month %) 3) food-journal)
; => ({:month 1 :day 1 :human 5.3 :critter 2.3}
      {:month 1 :day 2 :human 5.1 :critter 2.0}
      {:month 2 :day 1 :human 4.9 :critter 2.1}
      {:month 2 :day 2 :human 5.0 :critter 2.5})

This use is perfectly fine, but filter can end up processing all of your data, which isn’t always necessary. Because the food journal is already sorted by date, we know that take-while will return the data we want without having to examine any of the data we won’t need. Therefore, take-whilecan be more efficient.

Often, you want to know whether a collection contains any values that test true for a predicate function. The some function does that, returning the first truthy value (any value that’s not false or nil) returned by a predicate function:

(some #(> (:critter %) 5) food-journal)
; => nil

(some #(> (:critter %) 3) food-journal)
; => true

Lazy Seqs

map first calls seq on the collection you pass to it. But that’s not the whole story. Many functions, including map and filter, return a lazy seq. A lazy seq is a seq whose members aren’t computed until you try to access them. Computing a seq’s members is called realizing the seq. Deferring the computation until the moment it’s needed makes your programs more efficient, and it has the surprising benefit of allowing you to construct infinite sequences.

;; The following defines a lazy-seq of all positive numbers.  Note that 
;; the lazy-seq allows us to make a recursive call in a safe way because
;; the call does not happen immediately but instead creates a closure.

user=> (defn positive-numbers 
	([] (positive-numbers 1))
	([n] (lazy-seq (cons n (positive-numbers (inc n))))))
#'user/positive-numbers

user=> (take 5 (positive-numbers))
(1 2 3 4 5)

into

One of the most important collection functions is into. As you now know, many seq functions return a seq rather than the original data structure. You’ll probably want to convert the return value back into the original value, and into lets you do that:

(map identity {:sunlight-reaction "Glitter!"})
; => ([:sunlight-reaction "Glitter!"])

(into {} (map identity {:sunlight-reaction "Glitter!"}))
; => {:sunlight-reaction "Glitter!"}

Function Functions

apply and partial both accept and return functions.

apply

apply explodes a sequable data structure so it can be passed to a function that expects a rest parameter.

(apply max [0 1 2])
; => 2

By using apply, it’s as if you called (max 0 1 2). You’ll often use apply like this, exploding the elements of a collection so that they get passed to a function as separate arguments.

partial

partial takes a function and any number of arguments. It then returns a new function. When you call the returned function, it calls the original function with the original arguments you supplied it along with the new arguments.

(def add10 (partial + 10))
(add10 3) 
; => 13
(add10 5) 
; => 15

(def add-missing-elements
  (partial conj ["water" "earth" "air"]))

(add-missing-elements "unobtainium" "adamantium")

So when you call add10, it calls the original function and arguments (+ 10) and tacks on whichever arguments you call add10 with. To help clarify how partial works, here’s how you might define it:

(defn my-partial
  [partialized-fn & args]
  (fn [& more-args]
    (apply partialized-fn (into args more-args))))

(def add20 (my-partial + 20))
(add20 3) 
; => 23

 

In this example, the value of add20 is the anonymous function returned by my-partial. The anonymous function is defined like this:

(fn [& more-args]
  (apply + (into [20] more-args)))

In general, you want to use partials when you find you’re repeating the same combination of function and arguments in many different contexts.