[colug-432] Any kafka users on the list ?
tom at functionalmedia.com
Fri Nov 21 22:56:31 EST 2014
As a hadoop instructor of 4 years, i was looking to get into something
more interesting then the jumble of tools, and focus on one tool.
Kafka, storm and spark are all candidates, and perhaps docker.
My plan is to learn one of them well enough to teach it. Kafka and
Docker and Storm seem to be in demand, but not well served with
training. Spark is in demand, but some training exists.
Your summary is useful, I think I will read what documentation is out
there, and have a look at the code and see if I like it well enough to
On Fri, Nov 21, 2014 at 9:42 PM, Chris Embree <cembree at ez-as.net> wrote:
> 8-o So many questions.
> List is fine, nothing secret here, just my opinions which are now
> worth slightly less than you paid for them.... sorry.
> We run a limited size cluster due to physical limits. That said, it's
> anywhere from 10 -14 kafka nodes. Each w/ 2 dedicated 10k disks. GC
> hasn't shown up as an issue so far, but it might be the culprit behind
> a couple of anomalous issues.
> Generally, as a cluster is seems somewhat immature. It works well
> when it works well, otherwise things get ugly.
> It can use JMX for monitoring, but management tools are somewhat
> limited. One of smart guys on my team built a tool, dubbed Kurator
> (play on ES tools) that uses some Python Kafka API's to provide some
> insight. However, it relies heavily on Zookeeper status and Kafka
> telling the truth. We've seen a few issues that raise doubt about
> Kafka's agreement w/ ZK on what's real.
> HOWEVER: Our use case is extremely abusive. We're looking for 1.2M
> 1K transactions per second. If you are anywhere south of 100K/s
> chances are extremely good you can construct a highly reliable Kafka
> On the fence: We've had little luck re-allocating partitions to
> recover from a lost node. Listing Kafka topics will show you the #
> and nodes hosting In Sync Replicas (ISR). The re-balance feature is
> somewhat new and lightly documented, at least at last Google. I've
> had little luck re-syncing after a node loss.
> Kafka is a minor part of our solution in the grand scheme of things.
> I feel ill equipped to give a talk on the subject in any reasonable
> That said, I will be speaking at Cisco Live Milan (EMEA). I'd be
> happy to re-present at a COLUG if there is any value. The talk is a 4
> hour tech session (I'm only 1 of 3 speakers) on the entire openSOC
> project. My focus will be on the platform side and it may not be the
> best fit for a LUG. It's more of a HUG topic.
> FWIW, I absolutely hate the name Hadoop. ;)
> I hope that helps.
> On 11/21/14, Tom Hanlon <tom at functionalmedia.com> wrote:
>> Can you talk about it on the list ? If not maybe we can send some
>> private emails.
>> How big is the kafka cluster ? How many events are handled?
>> What are the details of the hiccups ? Java Garbage collection?
>> Configuration changes ? General strangeness ?
>> Does it provide any hooks for monitoring or managing? Nagios for
>> monitoring ? Some api hooks for management ?
>> On Fri, Nov 21, 2014 at 1:17 PM, Chris Embree <cembree at ez-as.net> wrote:
>>> Sadly, yes.
>>> We're using Kafka as the buffering queue for OpenSOC (getopensoc.com)
>>> and while it works well when things are fine, it has significant
>>> difficulty recovering from hiccups.
>>> Also, there are few tools for managing it from an Admin point of view.
>>> Deleting a topic is a non-trivial task, for example.
>>> On 11/21/14, Tom Hanlon <tom at functionalmedia.com> wrote:
>>>> Are there any kafka users on this list.
>>>> I am looking to dive into kafka and some use-case, war-story,
>>>> discussion with a user would be helpful.
>>>> If there is broader interest perhaps we can make a meeting
>>>> presentation out of it.
>>>> colug-432 mailing list
>>>> colug-432 at colug.net
>>> colug-432 mailing list
>>> colug-432 at colug.net
>> colug-432 mailing list
>> colug-432 at colug.net
> colug-432 mailing list
> colug-432 at colug.net
More information about the colug-432