Category Archives: Networking

Unikernel power comes to Java, Node.js, Go, and Python apps

Posted by on 13 June, 2016

This post was originally published on this site

An open source project sponsored by EMC allows applications written in C/C++, Java, Go, Node.js, and now Python to be transformed into unikernels — operating systems that do nothing but run a single, dedicated application.

UniK (pronounced “unique”) is one of several experiments with unikernels to see if their minimal footprint and security profile can work better than containers for some workloads.

UniK promises a simple way for an organization to find out if a unikernel version of a given app runs better than its containerized counterpart. The workload is about the same as would be required to deploy the app as a container.

Written mainly in Go, UniK compiles images that can then be deployed to Virtualbox, VMware vSphere, or Amazon Web Services. Go, C++, Node.js and Python are made part of a runtime that uses the rumprun platform, an existing toolchain for creating unikernel-like software. Java apps are deployed via OSv, a single-application OS that comes with JVM support.

Docker has been interested in bringing its container system and unikernels closer together. Back in January, it acquired Unikernel Systems, hoping to add the company’s toolchain so that deploying unikernels is as easy as compositing a Docker image. UniK uses Docker images for its needed tooling, but it doesn’t yet incorporate Unikernel Systems’ technology — so far, no implementation of a unikernel-centric Docker has been available for public use. 

Another recent project, IncludeOS, has attempted to ease unikernel creation, but not in as broad a manner as UniK. IncludeOS provides a C++ library for a minimal level of operating system functionality to a program, allowing it to be deployed as a self-contained image that boots on a hypervisor. Again, it’s C++ only, where UniK aims to encompass multiple languages.

Is Microsoft publishing its own FreeBSD? Yes and no

Posted by on 10 June, 2016

This post was originally published on this site

It sounds like another one for the Hell Freezes Over file: Microsoft has released a version of FreeBSD 10.3, an edition of the liberally licensed Unix-like OS.

But as with previous Microsoft dalliances in the world of open source-licensed OSes, this isn’t a case of Microsoft admitting Windows is a technological and philosophical dead end. Instead, it’s another case of Microsoft investing effort in making Azure more appealing as an environment to run such OSes.

Azure-izing FreeBSD

The details are simple: FreeBSD 10.3, the latest production version of the OS, is available as a download-and-go VM image in the Azure Marketplace. This particular image, however, has Microsoft, not FreeBSD Foundation (the organization that supports FreeBSD development) listed as the publisher.

So what’s new about Microsoft’s particular spin of FreeBSD? A post on the Microsoft Azure blog notes that it sports kernel-level improvements to improve network and storage performance, as well as the “Azure VM Guest Agent” that allows FreeBSD to talk to Azure Fabric and vice versa. There have been Linux kernel contributions by Microsoft in this same vein; they were designed to allow Linux to run well on Hyper-V.

A slightly new wrinkle is Microsoft’s non-Azure-centric contributions to FreeBSD. Those changes, according to Microsoft, are being upstreamed back into FreeBSD, “so anyone who downloads a FreeBSD 10.3 image from the FreeBSD Foundation will get those investments from Microsoft built in to the OS.” In other words, the changes in the Microsoft-published, Azure-hosted FreeBSD aren’t an Azure exclusive — all FreeBSD users will benefit in time.

Offering a helping hand

The other question people are likely to ask is why, kernel contributions notwithstanding, is Microsoft listed as the publisher of the distro? The short answer: support.

According to Microsoft’s blog post, the FreeBSD Foundation is a community of mutually supportive users, “not a solution provider or an ISV with a support organization.” The kinds of customers who run FreeBSD on Azure want to have service-level agreements of some kind, and the FreeBSD Foundation isn’t in that line of work.

This upshot is, if you have problems with FreeBSD on Azure, you can pick up the phone and get Microsoft to help out — but only if you’re running its version of FreeBSD.

Another incentive for Microsoft is that FreeBSD is used as the substrate for virtual appliances from a number of name vendors — e.g., Citrix and Gemalto. Microsoft wants those products to run on Azure, too, and has worked closely with their vendors to ensure that. Microsoft is also hinting this is just a prelude to not only more Hyper-V features in FreeBSD, but also more kernel-level performance contributions generally.

Its own spin on things

Microsoft has so far produced only one thing resembling a distribution of an open source OS: Azure Cloud Switch, a Linux distro designed for ASIC hardware to run Microsoft’s network management software. It hasn’t been made available for public use (it was built mainly for Microsoft’s own internal use at Azure), so don’t hold your breath waiting for it to appear on GitHub.

Microsoft’s direct contributions to other operating systems have inevitably revolved around making them more compatible with its own ecosystem. Even the new, Nadella-driven Microsoft, which is far friendlier to open source, isn’t likely to veer far from that course. But if it means an incrementally better FreeBSD for all, it’s hard to complain.

Mozilla’s new fund will prevent the next Heartbleed, Shellshock

Posted by on 10 June, 2016

This post was originally published on this site

Open source software is no longer just limited to applications running on computers and servers. It’s used in mobile devices, entertainment systems, medical equipment, and connected cars, to name a few. With open source software used by governments and practically every industry sector, finding and fixing vulnerabilities has moved beyond an “it would be nice” situation solidly into the “we have to do better” camp.

Toward that end, Mozilla launched The Secure Open Source (SOS) Fund to help pay for security auditing, remediation, and verification for open source software projects. As part of the program, Mozilla committed to contracting and paying security firms to audit projects’ code, working with the project maintainers to support and implement fixes, and paying for verifying the remediation work to ensure bugs have been addressed. Mozilla will also work with the maintainers to manage vulnerability disclosure. Mozilla supplied The SOS Fund with $500,000 in initial funding and encouraged other companies and governments to support the program by contributing additional funds.

“We challenge these beneficiaries of open source to pay it forward and help secure the Internet,” Mozilla said.

The discovery of Heartbleed in OpenSSL and Shellshock in Bash showed that open source software wasn’t necessarily more secure than closed source applications. The idea that more eyeballs looking at the code meant vulnerabilities would be found quickly breaks down if everyone assumes someone else is looking. Some of the projects were tremendously popular, creating a situation where many people trusted and relied on code no one had vetted. Many people realized for the first time just how underfunded and undermanned some of these popular projects were, such as the fact that OpenSSL had only two developers on the project and they were both working part-time.

What’s especially concerning is that  — more than two years after Heartbleed — there are still widely used open source projects with a single developer or two that don’t have corporate sponsorship and rely on volunteer donations. These projects frequently don’t have the resources or funding to focus on application security basics, to perform regular testing and remediating found bugs. Some of the projects can be found in critical applications, networking infrastructure, and services. Vast swaths of the internet rely on open source technologies. As much as 30 percent of deployed software in the Global 2000 is open source, and most modern applications — even commercial closed-source ones — include open source components.

“Adequate support for securing open source software remains an unsolved problem,” Mozilla noted.

Fixing issues in open source software

As part of the Mozilla Open Source Support program, The SOS Fund will cover the costs of the audits themselves and help with coordination and other types of support for various widely used open source libraries and programs. Mozilla has already supported audits for PCRE (Perl Compatible Regular Expressions), a fork of the libjpeg codebase libjpeg-turbo, and the phpMyAdmin web-based admin tool for MySQL databases. The effort uncovered 43 vulnerabilities across the three projects. Mozilla worked with Cure53 for the PCRE and libjpeg-turbo’s audits, and with NCC Group for the phpMyAdmin’s audit.

“The initial results confirm our investment hypothesis, and we’re excited to learn more as we open for [more] applications,” Mozilla said.

The audit found 29 vulnerabilities in PCRE, of which one was rated critical, five as medium, 20 as low, and three as informational. The critical vulnerability was a stack buffer overflow that could have led to arbitrary code execution when compiling untrusted regular expressions, according to the report. All of the issues, except a low severity bug, have been fixed in PCRE 10.21.

The libjpeg library, which is used by several well-known open source projects such as Chrome, LibreOffice, Firefox, and other flavors of VNC, contained five vulnerabilities. One was rated as high severity, two as medium, and two as low. The high severity flaw was an out-of-bounds read that may not be exploitable. The two medium severity flaws were originally flagged as denial-of-service issues, but turned out to be issues with the JPEG standard, and affect multiple JPEG implementations. The issues “can be triggered by entirely legal JPEGs, and so are not easy to mitigate in any JPEG library itself,” according to the audit report, which contains suggestions as to how applications using JPEG can mitigate them in their own code. Other than the issues in the JPEG standard, all of the bugs have been fixed in libjpeg-turbo stable version 1.5.

Finally, phpMyAdmin had nine different flaws, with three medium severity flaws, five low, and one informational. Two of the issues have been partially fixed and the remaining seven have been fixed in phpMyAdmin 4.6.2.

Project maintainers can apply for support or get more information from the Mozilla Open Source Support program page.

Supporting open source software security

Mozilla is not saying this initiative alone will fix the application security problem for open source. Security is a multi-step process that requires increased investments in areas such as education and best practices. The SOS Fund will provide needed short term benefits and industry momentum to help strengthen open source projects, Mozilla said.

The SOS Fund is intended to be complementary to the Linux Foundation’s Core Infrastructure Initiative, said Chris Riley, head of public policy at Mozilla. CII focuses on deeper investments into open source software that is used in critical applications, such as supporting infrastructure costs, development efforts, and governance. The SOS Fund’s audits and remediation work aids open source software projects in the ecosystem with “lower-hanging fruit security needs,” he said.

“To have substantial and lasting benefit, we need a broad range of solutions, including audits, education, best practices, and a host of others,” Riley said.

As WhiteHat Security’s Setu Kulkarni noted, The SOS Fund is a “step in the right direction,” but it’s not a stand-alone process. Security data needs to be incorporated into a risk-based application security program.

No one expects software applications to be free of vulnerabilities. But there’s a big difference between looking for and fixing obvious flaws before going to production, and just shipping with known flaws because it would take too much time to try to fix. Since software can’t be bug-free, it’s only reasonable that software be regularly updated so that vulnerabilities can be fixed.

While it’s possible to look for and fix vulnerabilities internally within the team, audits help teams tap into security expertise outside the project to help find issues. Veracode’s latest State of Software Security Report found that most applications submitted for software assessment have less than a 45 percent pass rate, and that nearly three out of four applications produced by third-party software vendors and software-as-a-service suppliers fail the OWASP Top 10 when initially assessed.

“We all rely on open source software,” Mozilla said in the blog post. “We hope this is only the beginning.”

Gosling rallies against Oracle for Java EE neglect

Posted by on 10 June, 2016

This post was originally published on this site

Oracle’s stewardship of Java is under fire — again.

This time, the company’s development of Java EE (Enterprise Edition) has become a sore spot for devotees of the platform, including Java creator James Gosling and a former Java evangelist who left Oracle in March.

Called Java EE Guardians, the group launched a petition about the matter on on Thursday, said Reza Rahman, former EE evangelist at Oracle and a leader of Java Guardians. Gosling’s name sits at the top of the membership page. The petition asks where Oracle stands on the planned Java EE 8 release, requests that the company maintain its commitment to the release, and claims that if Oracle is unwilling to do the work on Java EE, it should cede control to others, such as IBM or Red Hat.

While professing to lack insight into what’s going on inside Oracle, Gosling said Thursday that the “tidbits” he has heard were “pretty disturbing.” He left Oracle not long after the company acquired original Java owner Sun Microsystems in 2010, under acrimonious terms. “It’s not so much that Oracle is backing off on EE, but that it’s backing off on cooperating with the community,” Gosling said. “Taking it ‘proprietary’, going for the ‘roach motel’ model of non-standard standards — ‘customers check in, but they don’t check out.'”

The Java EE Guardians website emphasizes concerns about commitment. “Our purpose is advocacy, raising awareness, finding solutions, collaboration and mutual support. We believe that together — including Oracle — we can prove that this is the dawn of a new era for an ever brighter future for Java, Java EE, and server-side computing.”

Oracle was accused of de-emphasizing Java last year after it dismissed or reassigned evangelists, thereby raising questions about its commitment to the platform’s openness. Still, the company shortly thereafter held its annual JavaOne conference devoted to Java.

Another participant in Java Guardians, blogger Peter Pilgrim, describes Java EE 8 as being “in crisis.” There is an unease about the future, he said, though he admitted he doesn’t know if Oracle in fact has backed off its commitment to Java EE because the company has been silent. “Oracle has not made any public announcement about the Java EE reduction of commits and progress,” he noted.

Java Guardians emphasizes the importance of Java EE, pointing out that hundreds of thousands of applications have been written with it and that many frameworks depend on it.

Version 8, which will emphasize cloud capabilities as well as HTML5 and HTTP 2.0, is due in the first half of 2017, but Rahman questions this timeline. He describes work on Java EE 8 as “lackluster from the start,” with activity having been stopped, and he described the open source GlassFish application server, which has been the reference implementation of Java EE, as “very much a dead project.” He acknowledged GlassFish has competed with Oracle’s own commercially available Java application servers.

Rahman said he left Oracle after questioning the company’s commitment to Java EE himself, wondering, “How could I be evangelizing a platform that Oracle is clearly not investing in?” He now works as a consultant at Captech Consulting. Asked if Oracle wants the community to take over development of Java EE, Rahman responded, “It’s impossible to determine what Oracle wants because they have not even acknowledged yet that there is a problem.” Specification leads from Oracle, who are in charge of improvements planned for enterprise Java, have not been responsive to input, according to Rahman.

Leaving Oracle, Rahman said, gave him the bandwidth to do what needed to be done as far as promoting development of Java EE. The platform, he stressed, is fundamental because of its execution on servers. “Most work happens on servers, even with microservices, even on the cloud.” One benefit of the current situation around enterprise Java is it could result in less control over Java EE by steward in charge, which now is Oracle. “Oracle and Sun have always had an unhealthy amount of influence.”

Oracle could not be reached for comment Thursday on the efforts of Java EE Guardians.

Basic income: Silicon Valley resolves to disrupt poverty

Posted by on 10 June, 2016

This post was originally published on this site

An interesting development has arisen: Universal basic income has become the darling of Silicon Valley. But before you get up and dance to “Money for Nothing,” brush up on what Greek mythology has to say about Trojans who wheel gift horses into the city.

Technologists — from Tim O’Reilly to venture capitalist Albert Wenger to authors and entrepreneurs such as Peter Diamandis and Martin Ford — suddenly seem eager for government to hand out cash to us ordinary folks. Some, such as Sam Altman, are even putting their money where their mouth is. The president of startup incubator Y Combinator is funding a five-year study in Oakland on the effects of giving people enough money to live on, no strings attached.

What gives? When Silicon Valley leaders speak out, “it is usually to disparage the homeless, celebrate colonialism, or complain about the hapless city regulators who are out to strangle the fragile artisans who gave us Uber and Airbnb,” as The Guardian wrote.

Basic income bromance

In the ’60s and ’70s, when this country was caught up in a War on Poverty, basic income was an idea that had the support of people ranging from economist and libertarian hero Milton Friedman to Martin Luther King Jr. to Richard Nixon. Now the idea is undergoing a revival — and not only in Finland, the Netherlands, and Canada, which are all implementing experiments with basic income.

Silicon Valley’s embrace of the concept seems at first blush to be fueled by a belief that technology — specifically breakthroughs in automation, AI, and machine learning — could soon make people as obsolete as workhorses. In such a future, basic income will be necessary to stave off a Luddite uprising.

“Don’t destroy the robots,” says Professor Jeffrey D. Sachs of Columbia University. “But recognize that not everybody would be better off as a result of market forces. With redistribution everybody could be made better off.”

Of course, not all experts are convinced of humanity’s impending redundancy. According to the New York Times, for every analysis forecasting that half of all jobs in the United States will be replaced by new technology there are others finding no such evidence.

What can’t be denied is the widening gap between rich and poor in the United States. Tech’s supporters of basic income often point to this growing gap as proof that accelerating technology creates inequality. According to this theory, The Guardian says, capitalism is meritocratic and technology enriches “those exceptional few who are smart enough to perform tasks that are too complex or creative to automate, while impoverishing the rest.” 

But if technology really is to blame for this growing gap, why has wage growth stagnated for pretty much all workers — Wall Street excluded? People in IT earn about as much today in inflation-adjusted dollars as they did in the late 1990s.

Good cop or bad cop?

Inequality isn’t the inevitable by-product of technology, The Guardian argues. If it were, other industrialized countries would have levels of inequality comparable to the United States — which they don’t. Instead, Silicon Valley’s embrace of basic income is “the Trojan horse that would allow tech companies to position themselves as progressive, even caring — the good cop to Wall Street’s bad cop” when what’s probably needed is a transformation of the tax code.

Don’t fool yourself: Tech investors don’t expect to pony up and fund these basic income payouts. Heck, many are pioneers of tax avoidance schemes to avoid paying any taxes at all.

Nor do they propose putting an end to their gold rush on personal data. We are all currently giving away our data for free to tech giants. Telecom data alone is currently worth $24 billion per year, on its way to $79 billion in 2020, according to estimates by 451 Research.

What if instead of asking the needy to foot the bill, through the elimination of government programs like public housing, food stamps, and Medicaid — as Libertarians and conservatives propose — the basic income was structured like a dividend from tech companies for our “natural resources,” like Alaska does with its oil taxes and profits?

Spread the wealth

If Silicon Valley really wants to take steps toward the introduction of basic income, “why not make us, the users, the owners of our own data?” proposes Evgeny Morozov, author of “The Net Delusion: The Dark Side of Internet Freedom.” “Think of a mechanism whereby … data that now accrues almost exclusively to the big tech firms, would compensate citizens for their data with some kind of basic income, that might be either direct (cash) or indirect (free services such as transportation).”

This will never happen, Morozov says, “because data is the very asset that makes Silicon Valley impossible to disrupt — and it knows it …. Somehow our tech elites want us to believe that governments will scrape enough cash together to make it happen. Who will pay for it, though? Clearly, it won’t be the radical moguls of Silicon Valley: They prefer to park their cash offshore.”

The tech industry has fed on a steady stream of public goods ever since the U.S. military funded Silicon Valley into existence, The Guardian says. “Those goods might be government research, mined for profitable inventions, or the contents of your Gmail inbox and Facebook feed, mined for advertising revenue. What matters is they’re free, and they’re free because we give them away. If the robots ever arrive, their arrival will be bankrolled by our taxes, our attention, our data.”

Ben Tarnoff calls a basic income policy under these circumstances “the crumbs left by the bully who steals your sandwich.” It seems there really is no free lunch.

Google patches high-severity flaw in Chrome’s PDF reader

Posted by on 9 June, 2016

This post was originally published on this site

Chrome users who haven’t restarted their browser recently should do so immediately to receive a patch for a high-severity flaw in the browser’s built-in PDF reader. Attackers could execute arbitrary code on the user’s system by tricking them into opening a PDF document containing a malicious image, according to researchers at Cisco Talos.

“The most effective attack vector is for the threat actor to place a malicious PDF file on a website and then redirect victims to the website using either phishing emails or even malvertising,” Cisco Talos wrote in a blog post disclosing the vulnerability.

The heap buffer overflow (CVE-2016-1681) is present in the jpeg2000 image parser library used by PDFium, Chrome’s default PDF reader. The flaw is located in the underlying jpeg2000 parsing library OpenJPEG, in j2k.c’s opj_j2k_read_SPCod_SPCoc function. While an assert call prevents the heap overflow in standalone builds, Google uses a special build process that omits assertions, making the flaw exploitable in Chrome.

With attackers relying on weaponized PDF documents to target vulnerabilities in Adobe Reader, several browser makers have built-in PDF readers so that users don’t have to install plugins. However, just because these are built-in readers doesn’t mean users still don’t have to be careful about opening PDF files they receive via email attachments or they download from the Internet.

Google follows the automatic update model to keep Chrome on Windows and Macs up-to-date, which means most users are already on the latest version of the browser and are protected. That is, assuming they’ve restarted their browsers at least once since May 25. However, many organizations disable auto-updates in order to test new versions of Chrome on their networks before deploying them to endpoints. IT should prioritize testing and make sure users are running Chrome 51.0.2704.63 (released May 25) or even Chrome 51.0.2704.79 (released June 1) to address this flaw.

“It is fairly easy for an attacker to take advantage of this vulnerability,” Cisco Talos wrote. Attackers could use a specially crafted PDF document to execute code to cause a denial of service or some other attack.

As part of the research, Cisco Talos embedded a jpeg2000 image that had its SIZ market truncated in a PDF file. Since the number of components specified in the SIZ marker in this malicious image is 0 and it isn’t followed by individual component information, the code for parsing the jpeg file makes an erroneous call. The only difference between a valid jpeg2000 file and a malicious one targeting this vulnerability is the fact that SIZ marker specifies 0 components, Cisco Talos said.

Google assigned a CVSS 3.0 score of 6.3 to the flaw, and paid Aleksandar Nikolic of Cisco Talos $3,000 for reporting the vulnerability.

PDF documents are a fact of life for most users nowadays, so always think twice before opening them. Make sure reports are from reputable sources and exercise extreme caution before opening unsolicited documents. Some business functions — such as recruiting — are especially at risk since the role requires opening PDF files (such as resumes) which are sent unsolicited (from potential job candidates).

While built-in readers in browsers have gone a long way toward making it safer to open PDF files from the Internet, this vulnerability report is a timely reminder that even built-in readers can be vulnerable. Stay current with regular software updates, whether by restarting the browser on a regular basis or installing the updates as soon as they are available.

Linux desktop apps take a cue from Docker packages

Posted by on 9 June, 2016

This post was originally published on this site

Flatpak, sponsored by the Freedesktop project and engineered mainly by Alex Larsson at Red Hat, is a single application package containing everything needed to run an app across multiple distributions. But it also comes with partial dependency on a Red Hat technology that’s inspired no small amount of ire.

Container technology has led to a major rethinking of how to deploy and manage services. But Docker and its cohorts don’t yet address the problems of distributing and managing end-user or desktop applications on different editions of Linux. If you want to distribute an end-user Linux app, you have to build separate editions for each distribution and deploy it via that distribution’s own packaging mechanism.

All in one

Flatpak — so named for furniture giant IKEA’s product-packaging method — is not a containerization system, but it draws on containers for some of its ideas. A Flatpak-delivered app runs in a “sandbox” (the developers deliberately avoided the word “container”), which is split into two parts. The application part contains the app itself, its data, and any libraries that are the same on every distribution. The runtime part contains only the dependencies that would be needed by different distributions.

So far, only a handful of desktop Linux applications have signed on to create Flatpaks of their apps, but they’re all big names. LibreOffice, for instance, now offers a Flatpak distribution, as do the GIMP image-editing application, the Inkscape vector-drawing program, Darktable (a photo workflow system), and a slew of apps for the Gnome desktop.

What are the downsides? For one, you have to install Flatpak’s runtime on any system where Flatpak-packaged apps will be run. Also, Flatpak’s sandboxing system inhibits certain behaviors that some apps need.

The notes that accompany LibreOffice’s Flatpak version state, “Flatpak-based apps are not yet able to reach out to other applications installed on the machine (like browsers) to e.g. open URLs through them. That means that e.g. clicking on a hyperlink in a LibreOffice Writer document might not work yet.” This apparently breaks the help system for LibreOffice, since it opens URLs.

Warning: Roadblocks ahead

Another potential issue is Flatpak’s partial dependency on systemd, the system-initialization process created by Red Hat. Flatpak uses systemd to help set up its sandboxing technology, but systemd’s mere existence has inspired no end of contention and polarization in the Linux world.

As a result, Flatpak will likely remain confined to distributions that already use systemd. Most of the big enterprise distributions — Red Hat, Ubuntu, and OpenSuse — already employ it, but they aren’t really used as desktop systems. (Red Hat’s Fedora, with a desktop incarnation, is considered separate from its mainline RHEL offering.)

The smaller, more user-friendly distributions have largely resisted systemd. But they also serve the user base that benefits most from Flatpak. Some of those distributions, such as Linux Mint, already allow systemd as an option, so in time the technical side of the issue may become moot, but if Flatpak doesn’t gain ground because of systemd in the first place or due to its other quirks, it’ll be doubly moot.

Do it now! From SHA-1 to SHA-2 in 8 steps

Posted by on 9 June, 2016

This post was originally published on this site

As deadlines go, Jan. 1, 2017, isn’t far away, yet many organizations still haven’t switched their digital certificates and signing infrastructure to use SHA-2, the set of cryptographic hash functions succeeding the weaker SHA-1 algorithm. SHA-1 deprecation must happen; otherwise, organizations will find their sites blocked by browsers and their devices unable to access HTTPS sites or run applications.

All digital certificates — to guarantee the website accepting payment card information is secure, software is authentic, and the message was sent by a person and not an impersonator — are signed by a hashing algorithm. The most common is currently SHA-1, despite significant cryptographic weaknesses that render the certificates vulnerable to collision attacks.

We have the big data tools — let’s learn to use them

Posted by on 9 June, 2016

This post was originally published on this site

Recently, at the Apache Spark Maker Community event in San Francisco, I was on a panel and feeling a bit salty. It seems many people have prematurely declared victory in the data game. A few people have achieved self-service, and even more have claimed to.

In truth, this is a tiny minority — and most of those people have achieved cargo-cult datacentricity. They use Hadoop and/or Spark and pull data into Excel, manipulate it, and paste it into PowerPoint. Maybe they’ve added Tableau and are able to make prettier charts, but what really has changed? Jack, that’s what.

Self-service is only step one on this trip to data-driven decision-making. Companies need to know their data before they can consider their choices — but this is still very much data at the edges with a meat cloud in the center.

So far, we use computer aided decision-making and computer-driven process where we have to: advanced fraud detection, algorithmic trading, and rigorously regulated processes (such as Obamacare). Generally, we don’t use it elsewhere.

Hundreds of millions of people are sitting in cubicles with a grid on their screen manually typing numbers into a spreadsheet. This manual data labor is the bane of corporate existence. As Peter Gibbons put it, “Human beings were not meant to sit in little cubicles staring at computer screens all day, filling out useless forms.”

We already have the technologies necessary to eliminate this and free humans for the intuitive leaps and creative endeavors they excel at. Yet as a recent New York Times article noted, we mostly use new technology to do the same old thing and do not reap the productivity rewards.

Though we need better tools, the wisdom of the day is that everyone will code, because that’s what the tools require. Truly, that only seems reasonable because Spark still sucks so much (more fairly, it’s a relatively low-level distributed computing framework). It only looks brilliant compared to what we had.

At the same time, Spark isn’t actually a framework for managing and gaining insights from our data. Now, the rabble will start chanting “applications!” Yet having 100 closed-loop applications will quickly lead to more Excelitis.

Instead, it’s time to employ a strategy. As I once said in a discussion about groupware, in a mature business, every email is a little failure, as is every hand-generated report or spreadsheet.

I’ll go further and say every time you have to stare at your phone, it’s a microfailure. In any city, look around, you’ll see hundreds of people missing everything around them while they hold their phones in their hands and stare at a tiny screen. Part of the problem is we’ll still polling and pulling for data. A machine-driven process (designed by people) would instead prompt us: You would know you’re not missing anything and do your job — or better yet, live your life.

Success isn’t more visualizations. Success is the abolition of the PC and the smartphone as we know them. Success is when we’re alerted to data as needed and spend most of our time making creative and intuitive leaps. Self-service, in other words, still indentures us to data labor. The next huge leaps are when we design real systems and go back to living something that looks a lot more like the future envisioned in the 19th century.

To do so, we must use data and the scientific method to make decisions and, more important, create processes and systems to make decisions rather than making them ourselves. We need to create methodologies around doing this rather than hoping the next tool of the day will free us from thinking about how to do this.

We already have the tools we need to get there. It’s time to start using them correctly.

The next steps for Spark in the cloud

Posted by on 8 June, 2016

This post was originally published on this site

Over the course of the last couple of years, Apache Spark has enjoyed explosive growth in both usage and mind share. These days, any self-respecting big data offering is obliged to either connect to or make use of it.

Now comes the hard part: Turning Spark into a commodity. More than that, it has to live up to its promise of being the most convenient, versatile, and fast-moving data processing framework around.

There are two obvious ways to do that in this cloud-centric world: Host Spark as a service or build connectivity to Spark into an existing service. Several such approaches were unveiled this week at Spark Summit 2016, and they say as much about the companies offering them as they do Spark’s meteoric ascent


Microsoft has pinned a growing share of its future on the success of Azure, and in turn on the success of Azure’s roster of big data tools. Therefore, Spark has been made a first-class citizen in Power BI, Azure HDInsight, and the Azure-hosted R Server.

Power BI is Microsoft’s attempt — emphasis on “attempt” — at creating a Tableau-like data visualization service, while Azure HDInsight is an Azure-hosted Hadoop/R/HBase/Storm-as-a-service offering. For tools like those, the lack of Spark support is like a bike without pedals.

Microsoft is also rolling the dice on a bleeding-edge Spark feature, the recently revamped Structured Streaming component that allows its data to stream directly into Power BI. Structured Streaming is not only a significant upgrade to Spark’s streaming framework, it is a competitor to other data streaming technologies (such as Apache Storm). So far it’s relatively unproven in production, and already faces competition from the likes of Project Apex.

This is more a reflection of Microsoft’s confidence in Spark generally than in Structured Streaming specifically. The sheer amount of momentum around Spark ought to ensure that any issues with Structured Streaming are ironed out in time — whether or not Microsoft contributes any direct work to such a project.


IBM’s bet on Spark has been nothing short of massive. Not only has Big Blue re-engineered some of its existing data apps with Spark as the engine, it’s made Spark a first-class citizen on its Bluemix PaaS and will be adding its SystemML machine learning algorithms to Spark as open source. This is all part of IBM’s strategy to shed its mainframe-to-PC era legacy and become a cloud, analytics, and cognitive services giant.

Until now, IBM has leveraged Spark by making it a component of already established services — e.g., Bluemix. IBM’s next step, though, will be to provide Spark and a slew of related tools in an environment that is more free-form and interactive: the IBM Data Science Experence. It’s essentially an online data IDE, where a user can interactively manipulate data and code — Spark for analytics, Python/Scala/R for programming — add in data sources from Bluemix, and publish the results for others to examine.

If this sounds a lot like Jupyter for Python, that is one of the metaphors IBM had in mind — and in fact, Jupyter notebooks are a supported format. What’s new is that IBM is trying to expose Spark (and the rest of its service mix) in a way that complements Spark’s vaunted qualities — its overall ease of use and lowering of the threshold of entry for prospective data scientists.


Cloud data warehouse startup Snowflake is making Spark a standard-issue component as well. Its original mission was to provide analytics and data warehousing that spared the user from the hassle of micromanaging setup and management. Now, it’s giving Spark the same treatment: Skip the setup hassles and enjoy a self-managing data repository that can serve as a target for, or recipient of, Spark processing. Data can be streamed into Snowflake by way of Spark or extracted from Snowflake and processed by Spark.

Spark lets Snowflake users interact with their data in the form of a software library rather than a specification like SQL. This plays to Snowflake’s biggest selling point — automated management of scaling data infrastructure — rather than merely providing another black-box SQL engine.


With Databricks, the commercial outfit that spearheads Spark development and offers its own hosted platform, the question has always been how it can distinguish itself from other platforms where Spark is a standard-issue element. The current strategy: Hook ’em with convenience, then sell ’em on sophistication.

Thus, Databricks recently rolled out the Community Edition, a free tier for those who want to get to know Spark but don’t want to monkey around with provisioning clusters or tracking down a practice data set. Community Edition provides a 6GB microcluster (it times out after a certain period of inactivity), a notebook-style interface, and several sample data sets.

Once people feel like they have a leg up on Spark’s workings, they can graduate to the paid version and continue using whatever data they’ve already migrated into it. In that sense, Databricks is attempting to capture an entry-level audience — a pool of users likely to grow with Spark’s popularity. But the hard part, again, is fending off competition. And as Spark is open source, it’s inherently easier for someone with far more scale and a far greater existing customer base to take all that away.

If there’s one consistent theme among these moves, especially as Spark 2.0 looms, it’s that convenience matters. Spark caught on because it made working with gobs of data far less ornery than the MapReduce systems of yore. The platforms that offer Spark as a service all have to assume their mission is twofold: Realize Spark’s promise of convenience in new ways — and assume someone else is also trying to do the same, only better.

Page 1 of 512345

Social Media

Bulk Deals

Subscribe for exclusive Deals

Recent Post




Subscribe for exclusive Deals

Copyright 2015 - InnovatePC - All Rights Reserved

Site Design By Digital web avenue