Miso Engineering Culture, Part I: we believe in sharing

Within Miso as a company, communication is key

Before talking about how the engineering team works, it might be useful to mention Miso is a small team and that we try to communicate efficiently between each other and to be aware of the work others are doing.  Still, we don’t want to waste time by attending meetings which aren’t productive for us.

That’s why we use several channels of communication at work:

  • Yammer,  we use it to post about the status of our work or to share a link for everyone
  • Emails, like everyone else for targeted questions, long updates or event invites
  • Campfire, it’s by far my favorite. We have several topic rooms (Engineering, Support, Design, Product, Marketing). There is always a discussion with those topics, and everyone doesn’t have to be involved in every discussion but we want to be aware of it, campfire is a great tool for that purpose.
  • Meetings, informal discussions, sometimes we chat in-person after all we work in the same open space
  • Gtalk, when we’re too lazy to get up and talk to the person face to face.

Being active in Ruby community: attend meetups, blogging, create open source projects

Ruby community is active, especially in San Francisco, every week we can find a lot of meetups around Ruby/Rails. We try attend meetups weekly or just simply provide a venue for ruby workshops, more generally we try to be involved in the community.

There are 2 goals behind this, promoting what we’re doing and also promoting Ruby. Most of all it’s about sharing and improving our culture. You could learn for example what’s the best practices in hot new startups just by making a connection. This gives you some insight on what you’re doing. It’s a win-win situation to be involved outside of the narrow focus of your company.

Same thing in open source community, we’ve open sourced several projects: like rabl or yaml_record, or recently Gitdocs. By doing so we’ve got a lot of outside contribution Rabl, for instance, has 818 watchers on github and has 13 contributors, that keep improving our gem. Beyond it’s also a way to promote ourself with great codes that will attract new talent.

Blogging is really a good practice to take time to stand back and write down about a technical challenge we’ve been faced with or how we built our last cool feature, and also share with everyone our thoughts. As I keep repeating myself, blogging is also about being communal and have feedbacks from our readers and create a connection with them. The great example I want to share, it’s we found out earlier this week that one of our blog post has been an inspiration for VendorKit.

Sharing is definitely one of our core values that we believe in, because by sharing you receive important feedback, and with feedback you’re going to learn something and build something better. At the end of the day, you’ll become a better software engineer.

Turn ruby code into simple daemons with Dante

Context

At Miso, the engineering team has recently been working on a major shift to our architecture. Namely, moving to a much purer service oriented approach. Our new strategy as a whole is going to be the subject of many blog posts in the future. This transition has required us to setup a lot of new services. These services range from small API services written in Sinatra and Renee to front-end heavy services in Padrino, even to internal services communicating with protobuf messages.

Every one of these services has a lot of shared behaviors in common; each needs to log output, be monitored, be daemonized and easily start / stoppable, have automated deployment, etc. These shared commonalities between services and the quantity of these services led us to begin our journey to create a ‘unified’ services architecture that will manage all of this for us. While that project is still a work in progress, today I would like to introduce one tiny library that we built along the way.

Introduction

One of the common behaviors each ruby service needs is a simple executable that can reliably start / stop a process. More-over, each executable needs to be able to start the process in non-daemon and daemonized mode, allow a port to be specified, allow a pid file to be specified, and the executable should be able to stop daemonized processes that had been started. Ideally, this same interface could be used with God or Monit to monitor the process.

We started by looking at existing daemonizing gems such as daemon-kit, and while these are excellent libraries, they didn’t really achieve quite our purposes. In the end, we decided to whip up a gem that would provide this in the simplest way that could possibly work. The result is an extremely tiny but well-tested daemonize gem called Dante.

Usage

First, let’s start with how to integrate Dante into a gem or ruby code base.

Since dante will become a dependency to your gem’s executable, add to your gemspec:

# mygem.gemspec

Gem::Specification.new do |s|
  # ...
  s.add_dependency "dante"
end

or alternatively to your Gemfile:

# Gemfile

gem "dante"

Now you can start using dante for your executables. Dante’s interface is extremely simple. You wrap your code in a Dante block and then the gem handles the rest and passes you the relevant options.

Let’s say we want to daemonize arbitrary ruby code, the simplest way would be:

# Starts the code as a daemonized process
Dante::Runner.new('foo').execute(
  :daemonize => true, :pid_path => "/tmp/path.pid") { do_something! }

You can then also stop based on the pid:

# Stops the daemonized process
Dante::Runner.new('foo').execute(
  :kill => true, :pid_path => "/tmp/path.pid")

Now, let’s say we wanted to boot up a daemonized thin api service using an executable. First we would create the executable in bin/blog and then wrap the thin bootup in a Dante#run block:

#!/usr/bin/env ruby

require File.expand_path("../blog.rb", __FILE__)

Dante.run('blog') do |opts|
  # opts available: host, pid_path, port, daemonize, user, group
  Thin::Server.start('0.0.0.0', opts[:port]) do
    use Rack::CommonLogger
    use Rack::ShowExceptions
    run Blog
  end
end

Notice how simple the Dante part is. Dante just accepts a daemon name and yield options that were passed on the command line to your program. In this example, the thin process uses those options to bootup thin on the correct port. So what does wrapping your executable code in Dante give you?

Functionality

With your executable daemonized with Dante, you get a familiar and standardized interface for controlling your process. The simplest way to start the thin server now is just to run the executable with no options:

./bin/blog

That will boot up a non-daemonized process in the command line. Now, let’s say we want to get help for this process:

./bin/blog --help

will print a nice auto-generated help message showing the daemons commands as a reference. Now, let’s start the process as a daemon, setting the port and pidfile:

./bin/blog -d -p 8080 -P /var/run/blog.pid

This will start the blog service on port 8080 storing the pid file at /var/run/blog.pid, and with that your process will be running. Now let’s stop the process:

./bin/blog -k -P /var/run/blog.pid

The -k flag will find the process from the pid and then kill the daemonized process for you.

Customization

In many cases, you will need to custom flags/options or a custom description for your executable. You can do this by using Dante::Runner explicitly:

#!/usr/bin/env ruby

require File.expand_path("../../myapp.rb", __FILE__)

# Set executable name to 'myapp' and default port to 8080
runner = Dante::Runner.new('myapp', :port => 8080)

# Sets the description in 'help'
runner.description = "This is awesome myapp!"

# Setup custom 'test' option flag
runner.with_options do |opts|
  opts.on("-t", "--test TEST", String, "Test this thing") do |test|
    options[:test] = test
  end
  # ... more opts ...
end

# Parse command-line options and execute the process
runner.execute do |opts|
  # opts: host, pid_path, port, daemonize, user, group
  Thin::Server.start('0.0.0.0', opts[:port]) do
    puts opts[:test] # Referencing my custom option
    use Rack::CommonLogger
    use Rack::ShowExceptions
    run MyApp
  end
end

Now you would be able to do:

./bin/myapp -t custom

and the opts would contain the :test option for use in your script. In addition, help will now contain your customized description in the banner.

That’s all you need to know to use Dante although the process also supports starting on alternate hosts and a handful of other options. This is all Dante does, the simplest daemonizer that could possibly work. But it has been extremely handy for us for controlling our services.

Process Monitoring

One easy way to monitor a daemonized process is to use God to watch the process. Thankfully, God works perfectly with Dante and they cooperate quite nicely. God is supposed to be able to daemonize processes automatically but in my experience this can be unreliable and having a consistent interface for the executable can be convenient for testing.

Setting up a god file for a dante process is simple:

# /etc/god/blog.rb

God.watch do |w|
  pid_file = "/var/run/blog.pid"
  w.name            = "blog"
  w.interval        = 30.seconds
  w.start           = "ruby /path/to/blog/bin/blog -d -P #{pid_file}"
  w.stop            = "ruby /path/to/blog/bin/blog -k -P #{pid_file}"
  w.start_grace     = 15.seconds
  w.restart_grace   = 15.seconds
  w.pid_file        = pid_file
  w.behavior(:clean_pid_file)
  w.start_if do |start|
    start.condition(:process_running) do |c|
      c.interval = 5.seconds
      c.running = false
    end
  end
end

Conclusion

Our team will have a lot more to talk about soon as we begin open-sourcing various parts of our new services architecture. However, I hope Dante can be useful as a standalone utility. I know that it has been for us. Be sure to checkout Dante for more information and to try this out for yourself.

For an example of our usage of Dante in a ruby gem, check out Gitdocs, our open-source dropbox clone using Ruby and Git. In particular, check out the cli.rb class for advanced usage.

Collaborate and track tasks with ease using GitDocs

At Miso, we have tried a lot of task tracking tools of all shapes and sizes. However, since we are a small team, we all end up using those in conjunction with just a plain text file where we ‘really’ track our tasks. Nothing beats a simple text file for detailed personal task tracking and project tracking. I have been using text-file based task management since as long as I can remember and most of the team shares this sentiment. A separate issue that commonly comes up is just a place to dump code snippets, setup guides, team references, etc. We have used a wiki for this, and are currently using the Github wiki for this very purpose. A third common need is a place to dump software, images, mocks, etc and we currently use dropbox for this along with a shared network drive.

Recently, a discussion arose around the possibility of just using a simple git repository of our own to serve all these disparate purposes. Namely, we needed a place to do personal and project task tracking, store useful shared code snippets, and dump random files, notes, etc that the development team needs access to. Fundamentally, what prevents a simple git repository shared between all developers serve this task? Well, there are a few obstacles to this purely git based approach.

The first is that with shared files and docs, you don’t want to have to worry about constantly pushing and pulling. Changes are happening all the time, text is being changed or files are being added and having to manually commit every line change to a text file doesn’t really make sense for this use case. We really wanted something like Dropbox…namely when a file is changed, the updates are pushed and pulled automatically. Second, we really need a way to view files in the browser. Imagine writing a great markdown guide and never being able to see the guide rendered in proper markdown. We needed a way for this shared repository to be browsable in the browser.

So after discussing this, we decided to build our ‘ideal’ solution and the result is gitdocs, a way to collaborate on documents and files effortlessly using just git and the file system. Think of gitdocs as a combination of dropbox and google docs but free and open-source.

Installation

Install the gem:

gem install gitdocs

If you have Growl installed, you’ll probably want to run brew install growlnotify to enable Growl support.

Usage

Gitdocs is a rather simple library with only a few basic commands. You need to be able to add paths for gitdocs to monitor, start/stop gitdocs and be able to check the status. First, you need to add the doc folders to watch:

gitdocs add my/path/to/watch

You can remove and clear paths as well:

gitdocs rm my/path/to/watch
gitdocs clear

Then, you need to startup gitdocs:

gitdocs start

You can also stop and restart gitdocs as needed. Run

gitdocs status

for a helpful listing of the current state. Once gitdocs is started, simply start editing or adding files to your designated git repository. Changes will be automatically pushed and pulled to your local repo.

You can also have gitdocs fetch a remote repository with:

gitdocs create my/path/for/doc git@github.com:user/some_docs.git

This will clone the repo and add the path to your watched docs. Be sure to restart gitdocs to have path changes update:

gitdocs restart

To view the docs in your browser with file formatting:

gitdocs serve

and then visit http://localhost:8888 for access to all your docs in the browser.

Conclusion

Our development team has started using Gitdocs extensively for all types of files and documents. Primarily, task tracking, file sharing, note taking, code snippets, etc. All of this in git repos, automatically managed by gitdocs and viewable in your favorite text editor or browser. Be sure to check out the gitdocs repository for more information.

Writing simple Ruby client/servers using Protobufs

We’ve recently been using protobufs as a serialization format for RPC here at Miso. While there are already some existing protobuf RPC solutions, we wanted something that could stream down any number of protobuf objects, as well, we wanted the ability to have void return types.

We built Protoplasm to solve these problems. Protoplasm’s server is built on top of Eventmachine. Object serialization is done using Beefcake.

A common pattern for RPC using a serialization format is to have the request class have an enum for the request type and a series of optional fields to fill out the actual request details. Using Beefcake, our request object might look something like this:

class AddCommand
  include Beefcake::Message
  required :left, :int32, 1
  required :right, :int32, 2
end

class SubtractCommand
  include Beefcake::Message
  required :left, :int32, 1
  required :right, :int32, 2
end

class Command
  include Beefcake::Message
  module Type
    ADD = 1
    SUB = 2
  end
  required :type, Type, 1
  optional :add_command, AddCommand, 2
  optional :subtract_command, SubtractCommand, 3
end

In this example, we support two kinds of requests, ADD and SUB. So, let’s get to implementing this.

First of all, we need to tie our Type enum to the command fields inside the command object. To do this in Protoplasm, we create a Types module, and include Protoplasm::Types into there and define the relationship between the type, field and response type. Here is a complete example of how to do that:

require 'protoplasm'

module Types
  include Protoplasm::Types

  class AddCommand
    include Beefcake::Message
    required :left, :int32, 1
    required :right, :int32, 2
  end

  class SubCommand
    include Beefcake::Message
    required :left, :int32, 1
    required :right, :int32, 2
  end

  class MathAnswer
    include Beefcake::Message
    required :answer, :int32, 1
  end

  class Command
    include Beefcake::Message
    module Type
      ADD = 1
      SUB = 2
    end
    required :type, Type, 1
    optional :add_command, AddCommand, 2
    optional :sub_command, SubCommand, 3
  end

  request_class Command
  request_type_field :type
  rpc_map Command::Type::ADD, :add_command, MathAnswer
  rpc_map Command::Type::SUB, :sub_command, MathAnswer
end

Now we have all our definitions. request_class defines the class to use for our request. request_type_field tells us where to look for the command type in any given request object. The rpc_map method ties together the enum value, the field in the request object and response type.

With all this under our feet, let’s get to building a client and server for this. To write a simple Server for this, we could do the following:

require 'protoplasm'
require './types'

class Server < Protoplasm::EMServer.for_types(Types)
  def process_add_command(cmd)
    send_response(:answer => cmd.left + cmd.right)
  end

  def process_sub_command(cmd)
    send_response(:answer => cmd.left - cmd.right)
  end
end

Then, we can start our server by adding

Server.start(40000)

To create a corresponding client, most of the work is done for you. Here is a sample client that would work with this server:

class Client < Protoplasm::BlockingClient.for_types(Types)
  def add(l, r)
    send_request(:add_command, :left => l, :right => r).answer
  end

  def subtract(l, r)
    send_request(:sub_command, :left => l, :right => r).answer
  end

  def host_port
    ['localhost', 40000]
  end
end

This client will always try to connect via localhost, and on port 40000, but other than that, this is a completely working example. We can then issue requests to a running server by doing the following:

client = Client.new
client.add(2, 3)        # => 5
client.subtract(10, 7)  # => 3
client.subtract(-10, 7) # => -17

The entire source code for this example is at https://github.com/bazaarlabs/protoplasm-example.

When a request is made what is really going on is the following. There are nine bytes sent as a header, then the entire serialized protobuf object is sent. The first byte is a reserved byte, the next eight are a 64-bit unsigned, native endian number. This is the size in bytes of the protobuf object.

The server responds with first the reserved byte. If it’s void, it stops sending data. If it’s streaming it will continue to send the full header plus each serialized object. The reserved byte in this case serves the purpose of indicating when streaming should stop. The client has no way to abort streaming aside from dropping the connection.

The full implementation of Protoplasm is available at https://github.com/bazaarlabs/protoplasm.

Bonus: Fun with Gemspecs!

Another common problem with this sort of RPC client/server arrangement in Ruby is where do you put the types information. Though you could use a third gem to hold onto just the types, a simpler arrangement is to use multiple gemspecs within the same repo. This is the technique employed by protoplasm itself, so, if you’re interesting, take a look at the source. Each gemspec has the same library files in common, but each gemspec has different dependencies. We avoid loading all dependencies by using autoload, but the same thing could be achieved by requiring the individual server and client ruby files.

What are subnets?

At Miso, we recently launched an integration with AT&T that allows the Miso iPhone app to connect with your TV. Have you ever thought that browsing hundreds of cable channels on your TV was a painful process? Our vision with Miso is that you can turn on your favorite show, see what your friends are watching, or browse additional information about a show, all from your phone.

To make this happen, Miso connects with the AT&T receiver over wifi, and we’re learning from our early adopters that home networks are more complicated than they used to be. Wifi routers are definitely the norm, and it’s not uncommon to have multiple wifi routers cover the living room, the master bedroom, and the guest bedroom upstairs. If you’re plugging in a wifi router that you bought from Best Buy, you’re probably creating a new subnet, which is something you almost never need to care about.

As it turns out, Miso’s integration with AT&T and DIRECTV both assume that a home only has one subnet, which is not always true. This has caused some support issues and prompted some people within Miso to ask, what is a subnet? Good question! As a programmer, I have some basic knowledge of network engineering, but answering this question prompted me to dig a bit deeper.

What is a subnet?

Subnets and IP addresses go hand in hand. Together, they are the building blocks for how computers find each other on a network. For instance, every time you visit a website, you first have to find the computer on the internet that hosts that website.

Addressing on the internet works a lot like mailing addresses for your apartment. The zip code for my apartment is 94105; if you were to Google that, you could see that I live within a certain region of San Francisco. Subnets are like zipcodes – they divide up the internet world into small regions.

The subnet is actually part of the IP address. You’ve probably seen an IP address for a computer. My IP address at this coffee shop is 192.168.5.199. The first three numbers (192.168.5) is the subnet (or network address) and the last number (199) is my laptop’s address (host address). The guy sitting at the next table probably has an address like 192.168.5.198. Because we’re on the same subnet, we could do things like share playlists with iTunes or exchange files on a shared folder. The three numbers that form the subnet are somewhat random, in the same way that zipcodes are somewhat random.

For nerds only: the tricky part is that which part of the IP address is the subnet actually varies, and it determines how large the subnet is. The mechanisms for dividing up address space has evolved over the years, from classful addressing to classless addressing. We’re at the point where we’re out of IPv4 addresses, thus the creation of IPv6 which has many more available addresses.

So, great, subnets are like zipcodes. They group computers together in some logical way. Laptops in the same cafe are probably on the same subnet and can do stuff like share iTunes playlists, and laptops that are in different cafes probably can’t. Yet, there must be some way to communicate between subnets. The websites that I visit every day are hosted on computers that are all on different subnets.

How does information travel between subnets?

Sending information across subnets is a lot like sending mail across the country. After writing the address on the envelope, I deliver the letter to my local post office. My local post office looks at the zipcode and begins the process of routing the letter through hubs to bring the letter in the general vicinity of the destination address. As the letter approaches the destination address, it will be given to another local post office, which ultimately puts it on a mail truck that goes to the actual address.

In the same way, let’s say that I want to visit google.com. From my laptop, the address for google.com is 74.125.224.114. My laptop has no idea where that address is, so it does the equivalent of delivering my request to the local post office, which is the wifi router of this coffee shop (you may have also heard this referred to as a gateway). The coffee shop’s wifi router also doesn’t know where that address is, so it forwards the request to the internet provider (ISP) of the coffee shop. The internet provider has a better sense for zip codes, so it starts the process of sending the packet in the right direction.

For nerds only: This process is called IP routing and there are several algorithms for doing it and they vary with IPv4 and IPv6. The algorithms differ in how chatty they are and what size network they work well with, among other things. Many algorithms are based on Dijkstra’s algorithm.

So how come I can’t share my iTunes playlist with any computer?

I can access google.com from anywhere, but I can’t access that laptop from the other coffee shop. The reason for that is that there are many more computers (and IP addresses) than there are mailing addresses, so they can’t all be public. If you work in a large business, then your department probably has an internal mail stop that’s handled by your company’s mailroom.

In the same way, internet addresses are divided up into public and private addresses. Public addresses can be reached from anywhere, but private addresses can only be accessed in local networks. Public addresses are handed out by the internet equivalent of the post office called the Internet Assigned Numbers Authority (IANA). If I want a public mailing address, I have to contact the post office. If I want a public IP address, then I have to contact my internet provider. My internet provider is a member of a Regional Internet Registry (RIR), which is a member of the IANA. It’s fairly common that one home will have one public IP address, although you can pay your ISP for more.

So, why doesn’t AT&T and DIRECTV work across multiple subnets?

Hopefully, you have a high level sense for internet addressing, subnets, and routing, although we still haven’t explained why our AT&T and DIRECTV integrations don’t work across multiple subnets yet. We need to understand one more concept – multicast.

Most network communication is two computers talking to each other directly. In some cases, you want to communicate with multiple computers at the same time. Broadcast and multicast are the two ways of doing that. In order for Miso to find the cable receiver, it makes an announcement over multicast asking if there are any cable receivers in the network. That’s the same way your computer discovers printers on your local network.

As you can imagine, multicast announcements can be pretty noisy. Imagine if my laptop made a multicast announcement and every computer in the world had to pay attention; things would get pretty chaotic. Therefore, multicast messages only go out to current subnet. If my cable receiver is on a different subnet, it won’t receive my multicast announcement.

There are a number of ways of overcoming this problem which include adjusting multicast TTL levels, router port forwarding, or configuring routers as bridges. TTL (Time-to-live) tells the multicast message how many hops to take before it should give up. By default, this value is 1, which means that it will deliver the multicast messages to all computers on the local subnet only. Router port forwarding and configuring routers as bridges are other mechanisms for allowing the multicast traffic to hop across to an adjacent subnet; however, for our AT&T integration, they are less ideal because they involve users understanding how to administer their home networks.

Hopefully, you found this high level guide useful. Feedback welcome!

Introductions

Hello readers!

My name is John Wu, Miso’s newest addition to the engineering (specifically iphone) team. For my first blog post, I want to start with something light and talk briefly about who I am and how I came to work at Miso.

I majored in Mathematics as an undergraduate. I chose math for a couple reasons. First, I thought it was easy. Like many of you, math was the one class I never had to prepare for in high school. Math period became synonymous to nap time. I was the kid the teacher never wanted to call on because I’ll deprive others of their opportunity to learn.  Second, I was “good” at it, and by that I mean I felt comfortable manipulating formulas and performing complicated computations, it made “sense”, and I understood the logic. Finally, I believed Math would open the doors to just about anything one could aspire to do — my reasoning was simple: math exists in almost everything, even haikus.

Looking back, I had no idea what I was getting myself into. I was lost during day one of Calculus I. Cryptic concepts such as epsilon-delta, proof by contradiction, theorems, corollaries, and “if and only if” had me dropping classes faster than I could sell my useless TI-89 (did not use this a single time in college). At first, I hated it. I learned proof after proof of theorems that seemed useless. I worked out examples that were so contrived I had a hard time seeing how any of it modeled real life. By the time I graduated with a B.S., I was only comparable in knowledge to mathematicians 200 years ago. My degree felt useless.

But math isn’t pure evil. While extremely difficult to appreciate, crafting proofs, often compared to poetry and painting, is a true art form in and of itself. I distinctively remember showing my professor my one page proof on a fundamental vector space theorem, to which he responded with his version that was exactly 5 lines. This experience led me to my first realization about math beyond calculus: finding the simplest version of a proof for any given theorem, only then will you have understood the theorem in its rawest form. Want some perspective? The popular Pythagorean theorem (a^2+b^2=c^2) has been proven (through distinctive ways) well over 300+ times. My second realization came to me after taking a course in number theory, a sub-discipline studying the positive integers (0,1,2,3,…): not all math is meant to be applicable or even useful, there are numerous theorems out there that well documented and proven, but lack any real world application, they are, in essence, math purely for math’s sake. The mathematicians who engage in this are the picassos and michelangelos of the math world, and who says they won’t offer unexpected insight in the (far) future?

The remainder of my mathematical journey came to an end in graduate school after a tragic realization of myself: that I simply wasn’t bright enough for math, not even to contribute on a microscopic level. I knew this to be true as I found myself begging child prodigies in my classes to help me with my homework. I gave up on a PhD. Rejection was difficult, but not without its lessons. I swallowed my pride, and began searching for something that would allow me to cause a dent. Trusting in my haiku-logic, I eventually found my place in application development.

I found many analogous counterparts of math in development work. Code design is similar to crafting proofs, except I now had the help of my coworkers, and the end goal of creating the simplest version is still sought after. Reading code is like trying to understand proofs, though in a different context, navigating logic turns out to be one and the same. Moreover, my degree accelerated my learning process, my background allowed me look beyond the complexities of the underlying math, such as floating point arithmetic, and focus on creating reusable and scalable code. The differences? Even an average Joe such as myself can contribute, even my code will manage to see the light of day, and be ran thousands of times a day. All along, this is where I belonged.

So I am glad to say, while my choices earlier on in life were based on erroneous assumptions, I do not regret it one bit! And as I continue to work at Miso, I’ll be sure to offer my experiences and help you, the reader, to leverage what we do at Miso to help make your app better. So please look forward to my next post, which will be on styling UINavigationBars in your iphone app. Thanks for reading!

Hybrid (Native + Web) Mobile App Development • Part 3: JavascriptCore and UIWebview optimizations

Part 3 is here! Yes, I’m back to writing more about hybrid framework goodness. Today I want to talk a bit about performance optimizations using a hybrid framework.

At Miso, we hold EJS templates on the device and combine it with JSON responses from the server to produce html rendered by a UIWebview. What we learned at F8 is that Facebook delivers the entire html from the server to the device directly. There’s some pros and cons to both approaches.

Miso’s EJS + JSON approach minimizes the payload coming back from the server by just asking for the JSON data. As long as you’re not freely including anything and everything in your JSON response, it should be a fairly small package. The downside however, is that your device would have to handle the process of combining the JSON data with the EJS template. This, in our benchmarks have shown to be fairly costly. Depending on the complexity, in the hundreds of milliseconds.

Facebook’s approach on the other hand leaves the processing all to the server so the device just needs to render the HTML response. This is beneficial especially for slower devices, but the payload coming back from the server is also much larger in comparison.

Both approaches are fine, I just wanted to point out the differences in them. Picking one over the other, like many other engineering decisions, depends on your use case.

Backgrounding EJS Processing

Everyone loves UIWebviews except for one thing. It’s SLOW. Rendering HTML blocks the foreground thread, and there’s simply nothing you can do about it. It was even worse for Miso because we did EJS processing on the foreground as well. Why not just background it? Well, long story short you can’t. The iOS javascript library is invoked from UIWebview (eg. [_webview stringByEvaluatingJavaScriptFromString:@"..."]). Even if you don’t perform any code that affects UI, iOS doesn’t allow you to background that process. To give you a taste of the distribution of processing time involved from start to finish:

EJS Processing – 50%
InnerHTML rendering – 40%
JSON data parsing – 10%

As you can see, backgrounding EJS processing would be a big big win for us. After a few hours of googling around for solutions, it became apparent that in order to accomplish this we will have to import our own javascript library to run in parallel with what iOS provides us. Fortunately, someone has already compiled a JavascriptCore library for iOS for their own purposes here. After successfully importing the library into the project, I wrote a lightweight singleton wrapper in combination with another interfacing class written by this fine gentlemen for you to background javascript by passing it a string.

I’ve open sourced the relevant code on github.

I was able to successfully background EJS processing this way, and boy was it worth it. The pause from foreground thread blocking became a lot more bearable for the cost of adding ~2MB to your binary size.

Making your UIWebviews faster

One of the biggest advantages of using UIWebviews is that you can use all the javascript/css goodness to make your views look tasty. However, as you may soon find out, the more css styling you add, performance starts to slow to a crawl. The most common symptom is that scrolling starts to feel very clunky and jerky. This is especially apparent on older iPhones such as the 3G. To figure out what is the biggest offender here, I went through and removed CSS stylings here and there to get a feel for performance impacts. What I found is that removing corner-radius and box-shadows makes the scrolling much smoother. This is not new as other sources have pointed out similar performance issues related to these new CSS3 features. Until they are addressed, use these sparingly and always try to benchmark with an older device.

If you’ve already stripped down your CSS and it’s still clunky, another factor to consider is sizing down image assets you are using on your page. During my performance tuning journey, to my embarrassment, I observed that Miso image assets were anywhere between 100% to a staggering 2000% larger than Facebook counterparts. We used a tool called Smusher and sized down all of our PNG images significantly. This was the last optimization I did that pushed performance across the finish line.

What I’ve learned through all this is that every bit of processing power should not be taken for granted on a mobile device. The motivation for making the app perform better came from trying to use it on my older iPhone 3G. Any mobile developer should have a base benchmark device to keep them honest. Until next time. Happy coding! :)

PS: If you have suggestions for topics that you want me to cover please feel free to share them in the comments section.

Forget Chef or Puppet – Automate with Sprinkle

Miso has a relatively standard server architecture for a medium-level traffic Rails application. We use Linode to host our application and we provision a number of VPS instances that make up our infrastructure. We have a load balancer equipped with Nginx and Varnish, we have an application server that runs our Rails application to serve dynamic requests, we have a master-slave database setup, and a cache server that has memcached and redis.

Early on, we manually setup these servers by hand by installing packages, compiling libraries, tweaking configurations and installing gems. We then manually documented these steps to a company wiki which would allow us to mechanically follow the steps and setup a new instance of any of our servers.

Automating our Setup

We decided recently that we wanted a more robust way of provisioning and configuring our different types of servers. Mechanically following steps from a wiki is inefficient, error-prone and likely to become inaccurate. What we needed was a system that would allow us to provision a new application or database server with just a single command. Here were additional requirements for our desired setup:

  • Simple to setup and configure
  • Lightweight system using familiar syntax
  • Preferably utilizing ssh and commands in the vein of Capistrano
  • Modular components that can be assembled to setup each server
  • Locally executable such that I can provision a server from my local machine
  • Doesn’t require the target machine to have anything installed prior

The set-up and provisioning tools which are most popular for open-source seem to be Puppet and Chef. Both are great tools that have a lot of features and flexibility, and are well-suited when you have a large infrastructure with tens or hundreds of servers for a project. However, Chef and Puppet recipes are not trivial to create or manage and by default they are intended to be managed from a remote server. We felt that there was unnecessary complexity and overhead for our simple provisioning purposes. Thanks for the corrections in the comments regarding the remote server requirement for Puppet/Chef.

For smaller infrastructures like ours, we felt a better tool would be easier to manage and setup. Ideally, something that is akin to deploying our code with Capistrano. A tool that can be managed and run locally and that can be maintained easily. After some exploration, we stumbled upon a ruby tool called Sprinkle, probably best known by one example automated script called Passenger Stack.

There are several aspects of Sprinkle that made this our tool of choice. For one, it is locally managed and the setup is done leveraging the rock-solid Capistrano deployment system. Also, even though Sprinkle is written in Ruby, the tool does not require Ruby or anything else to be installed on the target servers since the automated setup is executed on your development machine and communicates with the remote server using only SSH. The best part is that there are only a few concepts required to understand and use this system.

Understanding Sprinkle

The rest of this article is intended to be a long and comprehensive overview of Sprinkle. While I have read many blog posts about Sprinkle written across several different years, I hadn’t seen a post that covered each aspect of Sprinkle in full detail. The goal is that by the end of this post, you should be able to understand and write Sprinkle scripts as well as execute them. Let’s start off by exploring the four major concepts that make up Sprinkle and that will be used to build out your server recipes: Packages, Policies, Installers, and Verifiers.

Packages

A package defines one or more things to provision onto the server. There is a lot of flexibility in a way a package is defined but fundamentally this represents a “component” can be installed. Packages are sets of installations, options, and verifications, grouped under a meaningful name. The basic structure of a package is like this:

# packages/some_name.rb
package :some_name do
  description 'Some text'
  version '1.2.3'
  requires :another_package

  # ...installers...

  verify { ...verifiers... }
end

Note that defining a package does not install the package by default. A package is only installed when explicitly mentioned in a policy. You can also specify recommendations and/or optional package dependencies in a package as well:

# packages/foo_name.rb
package :foo_name do
  requires :another_package
  recommends :some_package
  optional :other_package

  # ...installers...
  verify { ...verifiers... }
end

You can also create virtual aliased packages that are various alternatives for the same component type. For instance, if I wanted to give people a choice over the database to use when provisioning:

# packages/database.rb
package :sqlite3, :provides => :database do
  # ...installers and verifiers...
end

package :postgresql, :provides => :database do
  # ...installers and verifiers...
end

package :mysql, :provides => :database do
  # ...installers and verifiers...
end

You can now reference that you want to install a :database and the script will ask you which provision you want to install. For more information on packages, the best place is looking at the source file itself.

Policies

A policy defines a set of particular “packages” that are required for a certain server (app, database, etc). All policies defined will be run and all packages required by the policy will be installed. So whereas defining a “package” merely defines it, defining a “policy” actually causes those packages to be installed. A policy is very simple to define:

# Define target machine used for the "app" role
role :app, "208.28.38.44"
# Installs the packages specified with a 'requires' when the script executes
policy :myapp, :roles => :app do
  requires :some_package
  requires :nginx
  requires :postgresql
  requires :rails
end

A role merely defines what server the commands are run on. This way, a single Sprinkle script can provision an entire group of servers specifying different roles similar to using Capistrano directly. You may specify as many policies as you’d like. If the packages you’re requiring are properly defined with verification blocks, then no software will be installed twice, so you may require a webserver on multiple packages within the same role without having to wait for that package to install repeatedly.

For more information on policies, the best place is looking at the source file itself.

Installers

Installers are mechanisms for getting software onto the target machine. There are different scripts that allow you to download/compile software from a given source, use packaging systems (such as apt-get), copy files, install a gem, or even just inject content into existing files. Common examples of installers are detailed below.

To run an arbitrary command on the server:

package :foo do
  # Can be any line executed in the shell
  runner "run_some_command --now"
end

To install using the aptitude package manager in Ubuntu:

package :foo do
  apt 'foo-package'
  # Supports only installing dependencies
  apt('bar-package') { dependencies_only true }
end

That would install the ‘foo-package’ when you run the ‘foo’ package. To install a ruby gem:

package :foo do
  gem 'foo-gem'
  # Supports specifying version, source, repository, build_docs and build_flags
  gem 'bar-gem' do
    version "1.2.3"
    source 'http://gems.github.com'
    build_docs false
  end
end

To upload arbitrary text into a file on the target:

package :foo do
  push_text 'some random text', '/etc/foo/bar.conf'
  # Supports sudo access for a file
  push_text 'some random text', '/etc/foo/bar.conf', :sudo => true
end

You can also replace text in a target file:

# packages/foo.rb
package :foo do
  replace_text 'original foo', 'replacement bar', '/etc/foo/bar.conf'
  # Supports sudo access for a file
  replace_text 'original foo', 'replacement bar', '/etc/foo/bar.conf', :sudo => true
end

To transfer a file to a remote target file:

package :foo do
  # Transfers are recursive by default so whole directories can be moved
  transfer 'file/some_folder', '/etc/some_folder'
  # Supports sudo access for a file
  transfer 'file/foo.file', '/etc/foo.file', :sudo => true
  # Also a file can have "render" passed which runs the template through erb 
  # You can access variables to output the file dynamically, or pass explicit locals
  foo_port = 8080
  transfer 'file/foo.file', '/etc/foo.file', :render => true, 
            :locals => { :bar_port => 80 }
end

To run a rake task as part of a package on target:

package :foo, :rakefile => "/path/to/Rakefile" do
  rake 'foo-task'
end

To install a library from a given source path:

package :foo do
  source 'http://foo.com/latest-1.2.3.tar.gz'
  # Supports prefix, builds, archives, enable, with, and more
  source 'http://magicbeansland.com/latest-1.1.1.tar.gz' do
    prefix    '/usr/local'
    archives  '/tmp'
    builds    '/tmp/builds'
    with      'pgsql'
  end
end

Installers also support installation hooks at various points during the install process which vary depending on the installer used. An example of how to use hooks is as follows:

package :foo do
  apt 'foo-package' do
    pre :install, 'echo "Beginning install..."'
    post :install, 'echo "Completing install!"'
  end
end

Multiple hooks can be specified for any given step and each installer has multiple steps for which hooks can be configured. For more information on installers, the best place is looking at the source folder itself.

Verifiers

Verifiers are convenient helpers which you can use to check if something was installed correctly. You can use this helper within a verify block for a package. Sprinkle runs the verify block to find out whether or not something is already installed on the target machine. This way things never get done twice and if a package is already installed, then the task will be skipped.

Adding a verify block for every package is extremely important, be diligent to have an appropriate verifier for every installer used in a package. This will make the automated scripts much more robust and reusable on any number of servers. This also ensures that an installer works as expected and tests the server after installation as well.

There are many different types of verifications, for each one there are installers for which they are particularly useful. For instance, if I wanted to see if an aptitude package was installed correctly:

package :foo do
  apt 'foo-package'
  apt 'bar-package'
  verify do
    has_apt 'foo-package'
    has_apt 'bar-package'
  end
end

This will only install the package on a target if not already installed and verifies the installation after the package runs. If we wanted to check that a gem exists:

package :foo do
  gem 'foo-gem', :version => "1.2.3"
  gem 'bar-gem'
  verify do
    has_gem 'foo-gem', '1.2.3'
    # or verify that ruby can require a gem
    ruby_can_load 'bar-gem'
  end
end

If you want to check if a directory, file or executable exists:

package :foo do
  mkdir '/var/some/dir'
  touch 'var/some/file'
  runner 'touch /usr/bin/abinary' do
    post :install, "chmod +x /usr/bin/abinary"
  end

  verify do
    has_directory '/var/some/dir'
    has_file      '/etc/apache2/apache2.conf'
    has_executable 'abinary'
  end
end

You can also check if a process is running:

package :foo do
  apt 'memcached'

  verify do
    has_process 'memcached'
  end
end

For more information on verifiers, the best place is looking at the source folder itself.

Putting Everything Together

Once you understand the aforementioned concepts, building automated recipes for provisioning becomes quite straightforward. Simply define packages (with installers and verifiers) and then group them into ‘policies’ that run on target machines. Generally, you can have a deploy.rb file and an install.rb file that are defined as follows:

# deploy.rb
# SSH in as 'root'. Probably not the best idea.
set :user, 'root'
set :password, 'secret'

# Just run the commands since we are 'root'.
set :run_method, :run

# Be sure to fill in your server host name or IP.
role :app, '83.434.34.234'

default_run_options[:pty] = true

The install file tends to define the various policies for this sprinkle script:

# install.rb
require 'packages/essential'
require 'packages/git'
require 'packages/nginx'
require 'packages/rails'
require 'packages/mongodb'

policy :myapp, :roles => :app do
  requires :essential
  requires :git
  requires :nginx
  requires :rails
  requires :mongodb
end

deployment do
  delivery :capistrano

  source do
    prefix   '/usr/local'
    archives '/usr/local/sources'
    builds   '/usr/local/build'
  end
end

Then you should store your packages in a subfolder aptly named ‘packages’:

# packages/git.rb
package :git, :provides => :scm do
  description 'Git version control client'
  apt 'git-core'

  verify do
    has_executable 'git'
  end
end

You can store your assets (configuration files, etc) in “assets” folder and access them from your packages to upload. Once all the packages and policies have been defined appropriately you can execute the sprinkle script on the command line with:

sprinkle -c -s install.rb

And sprinkle is off to the races, setting up all the policies on the target machines.

Further Reading

There are several other good posts about Sprinkle:

In addition, there are a lot of good examples of sprinkle recipes:

Let me know what you think of all this in the comments and if you have any related comments or questions. What do you use to automate and manage your servers?

Distributed Persistence for YAMLRecord

Last month we released YAMLRecord, a lightweight way to persist a small dataset into a simple YAML file which is fine if you’ve only one app server and you keep the YAML file stored locally.

As pointed out by Nelson Hernandez, YAMLRecord presents a problem when you have multiple application servers because the file can no longer be stored locally on a single application server. The problem is that if a change occurs in one of theses servers, the YAML file would have to be updated on all the others or stored in a new location accessible by all the servers.

To address this issue, the goal was to find a way to move the YAML file away from the app server and store it somewhere else that can handle access from multiple instances. With Nathan, we brainstormed and listed several solutions we could implement to allow us to scale out with YAML Record:

  1. Store YAML file on S3 and cache the data locally
  2. Use NFS and mount a shared volume and store the YAML on that volume
  3. Serialize YAML content in an existing persistence store such as PostgreSQL or Redis

Why Redis?

We chose to augment YAMLRecord with a pluggable “adapter” system that supports storing the YAML content on Redis. We would then move the YAML data from the file system to a redis-backed store. This may seem to be an odd choice but we have a number of simple lightweight YAMLRecord resources setup and we wanted to keep this system in tact for the time being.

The first solution with S3 and memcached seemed relatively sound in which we store the YAML files on S3 and then update them there each time a change is made. This solution seemed to be an issue because we would be forced to leave our LAN and access S3 each and every time we wanted to update the YAML data. Also, to retrieve the YAML data from a remote location each time seemed a bit overkill.

Using NFS shared volumes is the more “traditional” solution. Simply setup a shared volume and mount the YAML files to each application server. We opted not to use this approach because this would be yet another “moving part” in our system that we would have to manage as we deploy servers. We wanted to use an existing system if possible so that additional complexity wouldn’t be introduced.

We ultimately felt that the last solution where we would allow YAMLRecord to support different storage adapters would be a decent approach. We felt this made sense for our infrastructure which was already set up for redundancy and backups on PostgreSQL and Redis. When we built YAML Record, it was to avoid the heavy-ness of a SQL Database, it didn’t really make sense to pursue a storage adapter in that direction yet. We felt Redis would be an easy to use storage adapter with the key being the YAML file name and the value being the serialized YAML data.


What’s new on YAML Record?

With Redis as a solution to scale out YAML Record data, we also wanted to keep the option for a local store with a simple file. We came up with this idea of swappable adapters which allows YAML Record to be modular.

In order to use the Redis adapter it’s quite simple, you just need to pick which adapter you want to use in a model:

class Team < YamlRecord::Base
  # Declare which adapter you want to use
  adapter :redis, $redis # $redis is your redis client

  # Declare your properties
  properties :name, :role

  source "team," # Will store data in a key "yaml_record:team"
end

What’s next?

Here’s some ideas we want to achieve soon:

  • Be able to specify type for properties
  • Add validations
  • Add timestamps magic fields like on activerecord
  • New adapters build by you? :)

Objective-C Conventions

Writing code without conforming to some form of convention lends itself to a lot of confusion; Both for the code writer, and for someone else trying to understand your code. The pain worsens as the code base starts to increase in size, especially for a non-GC programming languages like Objective-C where memory management has to be cared for meticulously. Following a set of naming conventions for your instance variables, class names…etc. will help maintain a certain level of sanity. Here at Miso, we conform to a set of conventions that are partly derived from Apple standards, and partly from common best practices we’ve seen from other developers.

Class and Variable names

For variable names, Apple recommends starting with a lowercase letter (eg. UILabel *titleLabel). For class names, starting with a uppercase letter (eg. MyClass). Notice in the examples we also like to camel case the rest of the name.

iVars and local variables

One of the most common issues with a large class or viewController containing more than several instance variables is distinguishing them from a local variable within a function. What I like to do is prefix iVars with an underscore to distinguish them. (eg. _titleLabel)

- (void)createViews {
    // some bunch of code above
    UIImageView *imageView = [[[UIImageView alloc] initWithImage:[UIImage imageNamed:@"somepic.png"]] autorelease];

    [_containerView addSubview:imageView];
    [_containerView addSubview:titleLabel];
    // some more code below
}

In this example you can see that I don’t have to look at the header file to immediately tell that imageView and titleLabel are a locally defined variables, and that _containerView is an iVar simply looking at the name of the variables.

Starting with the Header File

When I approach designing a new class I like to start with the header file first. This is where you wireframe your code design without actually writing any implementation. Reason for this is because I like to start with what someone using this class would need. This means defining public class/instance methods, properties (public accessors), and instance variables. Let’s try this approach with an example. I love cats, so let’s go with that:

//
//  VirtualCat.h
//  Miso
//
//  Created by Joshua Wu on 8/12/11.
//  Copyright 2011 Miso. All rights reserved.
//

#import <Foundation/Foundation.h>

@interface VirtualCat : NSObject {
    BOOL _hungry;
    NSMutableArray *_foodHistory;
    NSString *_furColor;
    float _weight;
}

@property (nonatomic, readonly) float weight;
@property (nonatomic, retain) NSString *furColor;

- (id)initWithColor:(NSString *)furColor weight:(float)weight;
- (void)feed:(NSString *)food;

@end

Ok great! Here you can see that certain properties of this class is can be modified after instantiating. (furColor and weight) We also see that certain properties can’t be modified because public accessors were not defined. (hungry and foodHistory). I’ve also defined a constructor, and a instance method “feed”. For someone using this class, what they can do with this class is all nicely defined in the header file.

Private Interfaces

Using properties is great because you can use the accessors you get from it to assign new values to them without worrying about memory management. This is great for public properties, but what if I want to define private accessors to leverage the same convenience in my implementation file? This is what we can do:

//
// VirtualCat.m
// Miso
//
// Created by Joshua Wu on 8/12/11.
// Copyright 2011 Miso. All rights reserved.
//

#import "VirtualCat.h"

@interface Cat()

@property (nonatomic, retain) NSMutableArray *foodHistory;

- (void)askForFood;
- (void)sleepInBathTub;
- (void)meow;

@end

@implementation VirtualCat
@synthesize weight=_weight;
@synthesize furColor=_furColor;
@synthesize foodHistory=_foodHistory;

@end

By declaring an interface within the implementation file, you can assign private properties. I’ve also sneaked in some additional code in the private interface to demonstrate defining private instance methods. So far, we’ve done enough to define the blueprint of this class quite sufficiently without having written any implementation code! Yet, it is already clear what someone instantiating this class would be able to do with it, and what I need to implement. This gives me a good picture of what functions and variables I’ve defined when I come back to this code in the future. If you were in XCode looking at this class, it’d also conveniently give you a compiler warning saying that your implementation file is incomplete (Incomplete implementation of class). This is a good guideline for me to make sure I’ve implemented all the functions I intended to as I work through the implementation.

When it all comes together…

Alright, great! Let’s have some fun and finish off the implementation using the conventions I’ve defined earlier. Enjoy!

//
//  VirtualCat.m
//  Miso
//
//  Created by Joshua Wu on 8/12/11.
//  Copyright 2011 Miso. All rights reserved.
//

#import "VirtualCat.h"

@interface VirtualCat()

@property (nonatomic, retain) NSMutableArray *foodHistory;

- (void)askForFood;
- (void)sleepInBathTub;
- (void)meow;

@end

@implementation VirtualCat
@synthesize weight=_weight;
@synthesize furColor=_furColor;
@synthesize foodHistory=_foodHistory;

- (id)initWithColor:(NSString *)furColor weight:(float)weight {
    if((self = [super init])) {
        // Primitive iVars so I don't bother defining properties for them
        _weight = weight;
        _hungry = YES;
        
        // Using self. accessors to maintain memory santiy.
        // Less sane alternative would be eg. _foodHistory = [[NSMutableArray array] retain]
        self.furColor = furColor;
        self.foodHistory = [NSMutableArray array];
    }
    
    return self;
}

- (void)dealloc {
    [_foodHistory release];
    [_furColor release];
    [super dealloc];
}

#pragma mark - public methods

- (void)feed:(NSString *)food {
    if ([food isEqualToString:@"Can Food"]) {
        NSLog(@"Om nom nom nom");
    } else if ([food isEqualToString:@"Dry Food"]) {
        NSLog(@"Om nom");
    } else {
        NSLog(@"Not eating this");
    }
    
    [_foodHistory addObject:food];
    _hungry = NO;
    _weight += 1;
    
    [self sleepInBathTub];
}

#pragma mark - private methods

- (void)askForFood {
    _hungry = YES;
    [self meow];
}
          
- (void)meow {
    _weight -= 0.1;
    
    if (_hungry && _weight > 0) {
        NSLog(@"MEOW!");
        [self performSelector:@selector(meow) withObject:nil afterDelay:2];
    } else if (!_hungry && weight > 0) {
        NSLog(@"Purrr~");
    }
        NSLog(@"You allowed me to die damn it! (╯‵Д′)╯彡┻━┻");
    }
}

- (void)sleepInBathTub {
    NSLog(@"zzzzzz");
    [self performSelector:@selector(askForFood) withObject:nil afterDelay:1000];
}

@end

Pragma Marking

One more thing I’d like to suggest is use pragma marks to section off method types in your code. (eg. #pragma mark – private methods) This allows xcode to nicely section off methods in their quick access menus.

Suggestions!

By now, it should be evident how following conventions when coding can be very beneficial. This is by no means THE standard to follow, but one that I feel has helped me stay sane as I develop. If you have your own conventions that you feel would add value, I’d love to hear from you and exchange ideas on this topic.