Cascade is the latest project by NYTimes R&D department that allows precise analysis of the structures that underly sharing activity on the web. Initiated by Mark Hansen and working with Jer Thorp and Jake Porway (Data Scientist at the Times) the team spent the last 6 months building the tool to understand how information propagates through the social media space. While initially applied to New York Times stories and information, the tool and its underlying logic may be applied to any publisher or brand interested in understanding how its messages are shared.
Cascade是纽约时报R&D部门新近开发的可供精确分析社交网络背后的关系与结构的一款工具。在由Mark Hansen领导下的，囊括时代集团数据学家Jer Thorp 与Jake Porway的团队一共花费了近6个月时间来建立关于研究信息如何在社交网络中传播的模型，并且已经应用在纽约时报(New York Times)之上，任何对自身所发布的信息如何被共享与传播感兴趣出版者或商业团体都现在可以使用这个工具。
The app is primarily an exploratory tool, Jer explains. NYTimes publishes more than 6,000 pieces of content every month, and the team can now analyse every sharing event involving this content using Cascade. Jer describes the basic app workflow:
– A ‘Story Mode’ which shows a set of stories, and their associated event cascades. These stories can be requested via keyword search, section search, or a variety of ‘interestingness’ metrics. This view has some low-level visualizations of activity over time which allow us to focus in on event cascades which might be particularly interesting.
– A ‘Cascade Mode’ which allows us to view the event cascades. The cascades build over time – one of the things we’ve been most interested in with this tool has the time-based analysis. Rather than seeing static views of the social graph, we can actually see the sharing networks unfold over time. This mode has three distinct views in which each cascade can be examined:
1) A ‘side view’ which shows all of the events over time, and uses the Y axis to indicate degrees of separation from the originating event
2) A ‘radar view’ which views the system from overhead and lets users identify ‘threads’ of conversation
3) A 3D ‘tree view’ which combines views 1 and 2
The tool is built in Processing, with a lot of help from Andres Colubri’s GLGraphics library and toxiclibs. It runs on any machine, but is staged on a 5-screen video wall. This ‘exhibition’ app runs in an automatic mode, in which it explores the terrain of available data and wanders through the various presentation modes. The wall can also be controlled by a custom iPhone app which is a fairly simple and sends OSC commands to the display system. The team considered using touch or gestural input to control the display but in the end this gave them the control they wanted while being able to use the interface at some distance from the screens.
Cascade是在Processing下开发的，并且得到了Andres Colubri’s GLGraphics library和toxiclibs很多的帮助。这款应用能在任何硬件环境下运行并通过5个不同的屏幕在各种视图模式下自动运行和演示。开发小组曾经考量过通过触摸与手势来控制演示，但是现在用户已经可以通过一个iphone应用非常简单的来进行远程操作了。
All of the data is stored in a Mongo database, which they access through a Python API. They also used R quite a lot during the exploratory phases. The largest cascades they are currently loading have about 25,000 events. These are all rendered in 3D at full framerate (60fps) across 5 screens (6400×720) by a single machine. Jer suspects the system could handle trees of up to 50,000 events (all thanks to Andres & GLGraphics). The data that the team are currently using is a 2-week sample from July/August, but Jer says they will be moving to a near real-time data feed very soon.
Cascade中所有的数据都会被储存在Mongo database，并可通过一个Python API接口来访问，引用在运行过程中也会大量的访问R （一种应用于统计学运算的语言环境）。目前在单一主机上运行的最大的事件流包含了将近25000个事件，并且通过3D模式以60fps的刷新率呈现在5块分辨率为6400×720的屏幕上，Jer估计这套系统最多可以处理大约50000个事件（感谢Andres & GLGraphics）。团队目前所调用的数据仅仅是七月和八月中抽出的一个两周范围的样本，但Jer说很快cascade就能调用实时数据了。
The implementation used right now looks at the sharing of NYTimes content over Twitter but Jer explains that in fact Cascade is a system that could be used to model any kind of sharing activity. They’re already looking at implementing it for other Times properties (boston.com, etc. ) and will be testing it out on other sharing systems over the coming months.