[V]Cascade: 一个深度分析人们网络活动和网络社会化关系的工具

原文: Creative Applications, 译者/编辑: Wayne Tai@ Damndigital (转载请注明来源)

Cascade is the latest project by NYTimes R&D department that allows precise analysis of the structures that underly sharing activity on the web. Initiated by Mark Hansen and working with Jer Thorp and Jake Porway (Data Scientist at the Times) the team spent the last 6 months building the tool to understand how information propagates through the social media space. While initially applied to New York Times stories and information, the tool and its underlying logic may be applied to any publisher or brand interested in understanding how its messages are shared.

Cascade是纽约时报R&D部门新近开发的可供精确分析社交网络背后的关系与结构的一款工具。在由Mark Hansen领导下的,囊括时代集团数据学家Jer ThorpJake Porway的团队一共花费了近6个月时间来建立关于研究信息如何在社交网络中传播的模型,并且已经应用在纽约时报(New York Times)之上,任何对自身所发布的信息如何被共享与传播感兴趣出版者或商业团体都现在可以使用这个工具。

The app is primarily an exploratory tool, Jer explains. NYTimes publishes more than 6,000 pieces of content every month, and the team can now analyse every sharing event involving this content using Cascade. Jer describes the basic app workflow:

– A ‘Story Mode’ which shows a set of stories, and their associated event cascades. These stories can be requested via keyword search, section search, or a variety of ‘interestingness’ metrics. This view has some low-level visualizations of activity over time which allow us to focus in on event cascades which might be particularly interesting.

– A ‘Cascade Mode’ which allows us to view the event cascades. The cascades build over time – one of the things we’ve been most interested in with this tool has the time-based analysis. Rather than seeing static views of the social graph, we can actually see the sharing networks unfold over time. This mode has three distinct views in which each cascade can be examined:

1) A ‘side view’ which shows all of the events over time, and uses the Y axis to indicate degrees of separation from the originating event
2) A ‘radar view’ which views the system from overhead and lets users identify ‘threads’ of conversation
3) A 3D ‘tree view’ which combines views 1 and 2



– Cascade模式。这是一种具体分析事件流的模式,在这种模式下事件流模型将随着时间的推移而更新,相对于静止的图表来说在这种模式下我们将能确实的看到随着时间箭头的前进,通过对某一事件的分享,相关的社交网络如何展开。Cascade模式囊括了三种不同的视图模式:

  1. 侧视图。侧视图可以显示随着时间推移所发生的所有事件,并通过Y轴来表现这些事件与“源事件”的关联程度。
  2. 雷达图。为用户观察事件讨论进程提供了一种俯瞰模式。
  3. 树状图。结合了侧视图与雷达图的一种视图模式。

The tool is built in Processing, with a lot of help from Andres Colubri’s GLGraphics library and toxiclibs. It runs on any machine, but is staged on a 5-screen video wall. This ‘exhibition’ app runs in an automatic mode, in which it explores the terrain of available data and wanders through the various presentation modes. The wall can also be controlled by a custom iPhone app which is a fairly simple and sends OSC commands to the display system. The team considered using touch or gestural input to control the display but in the end this gave them the control they wanted while being able to use the interface at some distance from the screens.

Cascade是在Processing下开发的,并且得到了Andres Colubri’s GLGraphics library和toxiclibs很多的帮助。这款应用能在任何硬件环境下运行并通过5个不同的屏幕在各种视图模式下自动运行和演示。开发小组曾经考量过通过触摸与手势来控制演示,但是现在用户已经可以通过一个iphone应用非常简单的来进行远程操作了。

All of the data is stored in a Mongo database, which they access through a Python API. They also used R quite a lot during the exploratory phases. The largest cascades they are currently loading have about 25,000 events. These are all rendered in 3D at full framerate (60fps) across 5 screens (6400×720) by a single machine. Jer suspects the system could handle trees of up to 50,000 events (all thanks to Andres & GLGraphics). The data that the team are currently using is a 2-week sample from July/August, but Jer says they will be moving to a near real-time data feed very soon.

Cascade中所有的数据都会被储存在Mongo database,并可通过一个Python API接口来访问,引用在运行过程中也会大量的访问R (一种应用于统计学运算的语言环境)。目前在单一主机上运行的最大的事件流包含了将近25000个事件,并且通过3D模式以60fps的刷新率呈现在5块分辨率为6400×720的屏幕上,Jer估计这套系统最多可以处理大约50000个事件(感谢Andres & GLGraphics)。团队目前所调用的数据仅仅是七月和八月中抽出的一个两周范围的样本,但Jer说很快cascade就能调用实时数据了。

The implementation used right now looks at the sharing of NYTimes content over Twitter but Jer explains that in fact Cascade is a system that could be used to model any kind of sharing activity. They’re already looking at implementing it for other Times properties (boston.com, etc. ) and will be testing it out on other sharing systems over the coming months.






Have Your Say »


Required, never published