In this paper, a cloud-based software architecture for a multimedia collaboration platform is introduced. The platform is accessible from a typical web browser and allows users to collaborate over webcam chat. It allows users to view videos, photos, maps, documents, and listen to music, all in real-time.
In this paper, a cloud-based software architecture for a multimedia collaboration platform is introduced. The platform is accessible from a typical web browser and allows users to collaborate over webcam chat. It allows users to view videos, photos, maps, documents, and listen to music, all in real-time.
In this paper, a cloud-based software architecture for a multimedia collaboration platform is introduced. The platform is accessible from a typical web browser and allows users to collaborate over webcam chat. It allows users to view videos, photos, maps, documents, and listen to music, all in real-time.
Cristian Gadea, Bogdan Solomon, Bogdan Ionescu, Dan Ionescu NCCT Lab, University of Ottawa, Ottawa, Canada {cgadea, bsolomon, bogdan, dan}@ncct.uottawa.ca AbstractThe amount of multimedia content on the internet has been growing at a remarkable rate, and users are increasingly looking to share online media with colleagues and friends on social networks. Several commercial and academic solutions have attempted to make it easier to share this large variety of online content with others, but they are generally limited to sending links. Existing products have not been able to provide a scalable cloud-based system that synchronizes disparate web content among many users in real-time. Additionally, they have lacked a platform with a modular architecture that can be extended by developers to support new sources of online media. In this paper, a cloud-based software architecture for a multimedia collaboration platform is introduced. The platform is accessible from a typical web browser and allows users to collaborate over webcam chat while viewing videos, photos, maps, documents, and listening to music, all in real-time. As examples, it is shown how a distributed system called Watch Together was deployed to real users within Facebook and an e-learning environment. Usage data is provided from both deployments and observations are made on how users share and consume real-time multimedia content. Index Termsmultimedia in social networking environments, cloud-based digital content delivery, real-time web collaboration, internet human-computer interaction, online multimedia sharing I. INTRODUCTION Web 2.0 has dramatically transformed the way in which information is collected and presented to online users, and the enormous popularity of social networking has created a growing appetite for online multimedia content. Rather than just viewing simple HTTP pages, users now expect the ability to share and collaborate with other people, such as friends or colleagues, online and in real-time. This can be seen with services like Google Docs [1], where multiple users can work on the same document at the same time, and the document is stored on Googles remote servers in the cloud. Other popular websites which offer the ability for online content sharing between users include Facebook and Twitter, yet in both cases, a message posted by a user containing links to photos or videos is later viewed by one or more users separately. In many ways, this is in no way different than sending an e-mail with either links or attachments. The system presented in this paper aims to achieve real-time collaboration between users who share a collaborative session as a group. Although several commercial, open source and academic web-based collaboration solutions have existed for some time [2], the solution presented in this paper requires that all users in the session see the exact same state of the system - be it the same video at the same moment in time, the same image or the same page in a document or slide. Actions performed by a user (such as changing the image, fast forwarding in the video, changing the page) are replicated across all the users in order to ensure that the same state is maintained. Additionally, text and video/audio chat are integrated in the system, allowing users to see and hear each other as they collaborate over the online multimedia content. As is increasingly important for online services, the system must make use of a cloud-based architecture so that it continues to perform reliably as the popularity of the service grows. To support a large variety of online multimedia sources, the system has to be a platform on top of which additional synchronized applications can be developed and deployed. As such, the platform can be extended to support the latest popular online services and make their content collaborative with relative ease. The system must therefore provide easy- to-use APIs for developers. In addition, the synchronization between users must be handled in a transparent way such that developers do not have to worry about the necessary synchronization messages reaching all users within a session. Finally, the system must easily integrate with existing social networking environments. As such, it must make use of the latest web-based technologies and APIs. In order to be as accessible as possible for users, the system must not require the download and installation of proprietary browser plugins. This is achieved by using the Flex 4 framework for Adobe Flash [3] on the client side, along with a Real Time Messaging Protocol (RTMP) based [4] server. While Adobe Flash is a browser plugin, it is currently more widely adopted than HTML5, which is still in experimental states in all major web browsers. Yahoo! Zync was one of the rst products with the ability to view synchronized videos [5]. As an add-on to the Yahoo! Messenger client, Zync would detect when a participant of a two-user IM conversation pasted a link to a YouTube video, which would cause a synchronized video player window to appear. The creators found that the synchronicity and social co-presence would promote online conversation, engagement with shared media, with 31% of users returning to reuse the service after their rst session. To use Zync, users would need to download and install the Yahoo! Messenger application, as well as the Zync addon, and the system was limited to Microsoft Windows. 978-1-4577-0638-7 /11/$26.00 2011 IEEE In the system proposed in this paper, users are able to collaborate on multimedia content while having the content rendered on each users local machine within their own instance of that application. Unlike common remote access solutions [6], content in the system presented in this paper is not re-encoded as part of a remote screen update. Videos therefore run at full speed for all users and synchronization is ensured through event-based signals. The organization of the remainder of this paper is as follows: Section II discusses the requirements and architecture of the proposed collaboration system. Section III describes the implementation details of the system and its API. Results from the implemented system and usage data from two separate live deployments are presented in section IV. Section V then reects on this papers contributions and proposes topics for future research. II. REQUIREMENTS AND ARCHITECTURE This section offers a look at the overall design consid- erations of the architecture. The system presented in this paper must achieve real-time collaboration between the dif- ferent clients within the same session, while at the same time be extendable and scalable across multiple servers. The requirements for this architecture are rst established, and the architecture is then developed. A. Requirements A number of functional software requirements result from the real-time browser-based collaborative nature of the system presented in this paper: 1) Clients in the same session must see the exact same thing in the collaborative part of the application. 2) Clients must be able to communicate with each other through text, audio and video. 3) Clients must be able to search through numerous data sources for multimedia content. 4) Clients must be able to invite one or more of their contacts to a session. 5) Clients must be able to accept or reject a session invitation received from another user. 6) Clients which join a session after it has started must be synchronized to the state of the session. In addition, the following non-functional requirements were determined: 1) Clients must be able to access the system via a web browser without the need for downloading extra propri- etary addons or plugins. 2) The system must scale by supporting the deployment of new media servers in different locations. 3) New application types and data sources are loaded on demand by clients. 4) The system must be deployable within different social networking environments and make use of the environ- ments APIs to retrieve user information. Facebook Server Media Server Client 1 1) login 3) getSWF 2) authenticate user 4)get user info 5) connect Client 2 6) communicate with Client 1 Fig. 1. High-Level Server-Client Architecture. UserStateService SessionService WebcamVideoStreamService MediaServer Fig. 2. Server Side Services. 5) Depending on the deployment type, the system must allow different data sources and application types to be loaded. B. Architecture In order to achieve the functional and non-functional re- quirements of the system, the architecture was developed as a client-server application as shown in Figure 1. A web server component is required to provide the HTML page and the embedded ShockWave Flash (SWF) les necessary for the client web browser to run the application. Once the client is authorized and the SWF les are loaded on the client side, the client connects via RTMP to the Media Server. After a connection to the Media Server is established, the client can start collaborating with other users. 1) Server Architecture: Figure 2 shows how the server side consists of a Media Server that provides three key services: UserStateService - This service is used by the client to indicate changes in user presence (coming online, going ofine, currently busy) or to retrieve the list of connected users. WebcamVideoStreamService - This service is called by the client to indicate that the user has enabled (or disabled) their webcam. The service also allows users to send their video/audio data to the Media Server, as well as to connect to existing streams from other users. SessionService - This service is used to invite others to a collaborative session, signal an acceptance/rejection of an invitation, obtain a list of other users in an existing session, as well as to send all application-specic syn- chronization messages. In order to provide a scalable cloud-based architecture, a JGroups Gossip Server can be used to perform the role of a group manager as in Figure 3. As new Media Servers come Client Access Control Server Media Server 1 Media Server 2 2.1) getServer 2.2) server location 2.3) connect JGroups Gossip Server 2.4) broadcast new client 2.5) broadcast new client 1.1) broadcast new server 1.2) new server 1.3) new server 1.4) new sever 1.4) new server Fig. 3. Cloud-Based Server-Client Architecture. GroupManager GroupClient GroupMessageReceiver MediaServer ServerPeers ServerPeers ServerPeers 1 Fig. 4. Cloud-Based Server Side Features. online, they signal their availability to the JGroups Gossip Server, thus adding themselves to the group. The JGroups Gossip Server then noties the Access Control Server of the presence of the new Media Server. When a second Media Server comes online, the Access Control Server, as well as the rst Media Server, are notied. The client rst contacts the Access Control Server to obtain the location of the Media Server closest to the client. The client then connects to this Media Server, and the Media Server informs the JGroups Gossip Server of the new client. The JGroups Gossip Server then informs the other known Media Servers of this update. A Media Server must identify a list of users who are relevant; that is, users who are currently connected to the Media Server or are contacts of the connected users. Contacts are important to track since any updates to their status (online, ofine, busy, etc.) must reach the connected users, no matter which Media Server they were assigned. These relevant users should be identied as GroupClients. A GroupManager should be used to track users on external servers (ServerPeers) and transmit messages to them through the JGroups Gossip Server. A GroupMessageReceiver can be used to receive incoming messages. This is summarized in Figure 4. 2) Client Architecture: The client side has to be easily extendable and be capable to load its components on demand based on the integration within a social network. This requires the architecture of the client to be modular. The client side modules are split into Domain Modules and Media Mod- ules. Domain Modules are used for dealing with domain- specic data. A domain represents a deployment instance of the system and each domain has a separate login approach, which varies for each social networking environment. There are two Domain Modules developed for each domain: A Login Module Login Module performs the login logic and retrieves the local users information and contacts. A User List Module displays the users contacts within the user interface. The reason for using different User List Modules is that different domains can have different contact categorizations. For example, Facebook contacts are called friends; a user has a number of friends and there are no subcategories. If the system is deployed within an organizations social network, however, the organization may dene groups of users for various tasks. In order to maintain a consistent Look and Feel while making allowance for such functionality differences, all User List Modules extend an existing component. Media Modules are used for developing collaborative applications and dealing with their data sources. There are four Media Module types: a Search Module, a Viewer Module, a Control Module, and an Information Module: The Search Module allows the user to search for a specic media item to share with other people. An external API of the media or data source is typically used to retrieve the search results. For example, for a YouTube application, the search component mimics the standard YouTube search options (search videos from Today, This Week etc.) and displays a list of thumbnails for the videos. The Viewer Module displays the actual media content selected from the list provided by the Search Module. For YouTube, this is the actual streaming video. The viewer supports a maximized full-screen mode and is responsible for resizing the media content. The User Control Module displays the controls that the user requires to interact with the media content. Actions triggered via this component are typically synchronized with other users, although this is not always the case. For YouTube videos, the User Control Module contains play/pause, volume and video timeline seeking controls. The volume is not synchronized across the session, as various users might prefer different volume settings. The Information Module displays useful information re- lated to the currently selected media content. In the case of YouTube, this is simply the title of the video. Figure 5 shows the activity diagram of the client. Initially, the clients browser loads the SWF le from the server. Once the main application is loaded, it determines the domain under which it is deployed. Based on the domain, it loads a conguration le from the server. This XML conguration le describes the Domain Modules to load, as well as the Media Modules which are available to the application. The client code then loads the Login Module and the User List Module, and performs a login for the user. As part of the login process, the users contacts are loaded. Next, the client MainApp Loaded/ Determine Domain loadSWF Config Loaded loadConfig Load Login Module Load User List Module Load Media Module Connect to Media Server user loads media Fig. 5. Client Activity Diagram. opens the connection to the Media Server and the Search Modules become available. Finally, if the user selects a specic media item to view, the corresponding Viewer Module and User Control Module is loaded from the server. client1:Client Server:MediaServer client2:Client connect onConnect onConnectAccept connect.success notifyIsOnline userIsOnline (client1) notifyIsOnline (client1) sessionSetup disconnect onDisconnect userIsOffline (client2) Fig. 6. Connection Message Sequence Chart. 3) Server Client Communication Architecture: Due to the modular and extendable nature of the client, the communica- tion between the client and the server must be able to support messages that were not considered at design time. Figure 6 shows the sequence diagram for establishing the connection between the client and the server, and then disconnecting. The diagram assumes that there is already a client connected to the server (namely, client2). The sequence is initiated when a different client, client1, connects to the server. Upon connection, the Media Server performs access control and determines if the user should be allowed to connect. If the user is allowed to connect, a connect.success message is sent back to the users client (client1). The client then sends a notifyIsOnline message, which contains a list of the unique IDs of the clients contacts. Assuming client2 is a contact of client1, the server posts messages to both client1 and client2 that the other client is online. Following a collaborative WTServer + sendMessage() - localClient - remoteClient - clientID GroupClient + sendMessage(RemoteServer, Message) - serverPeers GroupManager JChannel + receive(message) GroupReceiver + receive(message) - host - port - application - joined RemoteServer - sessionController:Client ServerSession 1 * 0..1 * * Fig. 7. Server Class Structure. session, client1 disconnects from the Media Server. The server then noties client2 that client1 has gone ofine. III. IMPLEMENTATION In order to implement the described architecture, Adobe Flex was used for the client side and Red5 [7] was used as the Media Server. The reason for using Red5 for the server is that it is an open source implementation of the RTMP specication. In order to achieve the desired modularity of the system, Flex Modules are used. Flex Modules make it possible for new modules to be downloaded at run time as they are required. Figure 7 shows the server-side implementation that was used. The WTServer class handles client connections, webcam streams and session messages by using other classes like ServerSession to provide the services previously introduced in Figure 2. The server has zero or more clients connected at any time and has zero or more sessions running at any time. Clients can connect to the server and send messages to other clients. WTServer tracks clients using the GroupClient class In order for a Media Server to track updates for only the relevant users, WTServer creates GroupClient objects for all connected users and their contacts. One of either localClient or remoteClient properties is set to true to identify the type of GroupClient object. If the object is a remoteClient, a Re- moteServer object is associated to identify the external Media Server on which the client can be found. To transmit messages to other users, the sendMessages() method is used, indifferent of which server the other users are on. For messages that must reach users on remote servers, the GroupManager is used, which contains a list of serverPeers and a link to a JChannel object so that the JGroups Gossip Server can ensure the message gets delivered to the correct server. A GroupReceiver is used for managing responses received through the JChannel. On the client side, each Media Module implements the MVC software pattern. In order to allow for the extensibility of the platform, each of the controllers for the modules must implement one of the provided interfaces. This is done in order to ensure that the modules, which are loaded at runtime, can communicate with each other. The model, which is common between all four modules, is dened for each media type by the developer as it is media-dependent. The viewer for each of the modules is also media-dependent, and as such is dened by the developer. + initComplete() + command() + setSize() + sync() + getSyncState() <<interface>> ViewerController - MediaCommandQueue() + getInstance() + addCommandToQueue() + playbackCommands() - queue:MediaCommandQueue MediaCommandQueue + search() + loadMedia() <<interface>> SearchController + sendCommand() + maximizeMinimize() <<interface>> UserCommandController + getCommand() + setCommand() + getData() + setData() + setDescription() - command - data - description MediaCommand + setDescription() <<interface>> InformationController * 1 Fig. 8. Client Interfaces. As can be seen in Figure 8, the ViewerController, User- CommandController, InformationController and SearchCon- troller make use of the MediaCommandQueue to perform their necessary functions. MediaCommandQueue is a singleton that stores commands until the Viewer Module is loaded and then play the commands in the order received. Otherwise, if a new user joins a session, receives a synchronization command, and new commands are received while the user is still loading the correct Viewer Module, the newer commands would be lost, which would lead to the desynchronization of the sessions. A second role for the MediaCommandQueue is that of ensuring that the correct modules are loaded. When a new command is received, the MediaCommandQueue determines if the currently loaded modules can perform the command. If they can not, then the correct modules are loaded, the queue is emptied (since commands affecting the loaded modules no longer apply while the new modules are loaded) and the command is added to the queue. If, while modules are being loaded, a new command comes which requires different modules than the ones being loaded, then similarly the queue is emptied and the command is added to the queue. The MediaCommandQueue is also responsible for broadcasting messages to other members of the session through the connection to the server. To store a command, the MediaCommandQueue uses a MediaCommand data structure which has three elds: command, which represents a unique ID for the command (for example loadVideo, play, pause, seek for video media content); data, which stores the data for the command (for example the new time position for the video seek command); and description, which holds the description of the media type used by the Information Module. Through the use of these MediaAPI interfaces, developers can easily add new applications and data sources to the Collaborative Web Client Platform without needing to worry about the synchronization mechanism. The modular approach also allows different Media Modules to be distributed across different servers, and the system, through a simple congura- tion le, can nd and load them. Currently, applications have already been developed to support YouTube videos, Flickr and Facebook images, Twitter text messages, local documents (where users can upload documents to the system and share them), and live videos from UStream. IV. RESULTS The implemented system was called Watch Together to highlight its collaborative nature. Two variations of Watch Together were deployed to different groups of live users and their usage of the deployments was observed. A. Facebook Deployment The system was deployed to the public as a Facebook Application [8]. Users can access the system by logging in with their Facebook account and adding Watch Together to their application bookmarks list. The Facebook Developer API [9] was used to retrieve the information about the user who is currently logged in (such as their name, list of friends, prole image, etc.) and to populate the user interface. Figure 9 shows a collaborative session containing six users, where three of the users have enabled their webcam. The viewer module described in the architecture section appears in the top half of Watch Together and is always synchronized between the users in the session. The users currently in the session are shown along the bottom of the interface using either their prole images (retrieved via the Facebook API) or a live video stream from the users webcam. Near the middle of the interface is a menu bar with clickable icons. The left side of the menu bar contains the list of applications developed for the Watch Together platform (by using the API described in section III) to support various sources of popular online multimedia content. Clicking one of these icons brings up a search module containing thumbnails of the search results. The thumbnails can be clicked on to change what is displayed within the viewer module. The right side of the menu contains icons for the contacts module (which shows a list of available online users that can be invited to a session), text chat module, and the settings module, all of which appear on top of the viewer module but do not affect the synchronized content. Based on over a thousand users who opted in to share annonymized usage data, 65% of users of Watch Together were found to be male and 35% were female. Surprisingly, 24% of users enabled their webcam when using Watch Together, which is a strong indicator that users enjoy sharing and discussing online content in this collaborative fashion. Users were particularly drawn to the YouTube application for sharing their favourite video clips, with an average of 11.4 videos Fig. 9. Six Facebook Users Collaboratively Watching a YouTube Video. Age Group Percentage of Users (%) < 18 2 18-21 11 22-25 54 26-29 28 30-33 4 > 33 1 TABLE I WATCH TOGETHER USERS BY AGE GROUP. viewed per user. The distribution of users by age can be seen in Table I, with most in the 22-25 age group. In this deployment, the cloud-based architecture ensured that the users were automatically distributed among two differ- ent Watch Together Media Servers, and the webcam streams, as well as all synchronization messages, performed very well during testing. A common request from the test users was to implement the ability to modify the volume of the audio coming from each users video chat stream. A volume slider and mute button was therefore added over each users video stream that appeared whenever the user moved their mouse over the video chat area. B. E-Learning Deployment A second version of Watch Together was customized as a module for the Moodle e-learning software platform [10]. Moodle is an open source course management system used by Prof. Ionescu for his classes at the University of Ottawa. Each student is provided with an account in the system that allows them to upload assignments, check their grades, etc. The exibility of the design allowed Watch Together to be integrated using the Moodle module API such that students can collaborate with each other and with the professor over course material. While the system is mostly used for its document sharing feature during the professors online ofce hours, YouTube videos and other content related to the class are also made available for collaboration. The use of Watch Together in this way reveals its potential beyond entertainment and more towards enterprise-oriented social networking scenarios. Fig. 10. Students Collaborating Over Slides Within the Moodle Platform. V. CONCLUSION This paper presented the design and implementation of a cloud-based collaboration platform for experiencing syn- chronized online media from a web browser. It was shown how users can collaborate over video chat while viewing videos, photos, maps, documents and more in real-time. This differs from existing collaboration solutions which may require cumbersome installations and lack a scalable design. As a platform, developers can easily add new media sources to the system so that all popular digital media can be made available for instant sharing. Additionally, the systems ability to be integrated into social networking environments such as Facebook and Moodle was demonstrated. It was observed that users enjoy collaborating on online media in real-time with respect to one another. This feedback allows for further improvements to be made to all aspects of the system and for the additon of features such as real-time document editing and games. Future papers will also focus on more experimental results for the scalability of the cloud-based backend. REFERENCES [1] (2011) Google Docs - Online Documents, Spreadsheets, Presentations. Google Inc. [Accessed: March 2011]. [Online]. Available: http://docs.google.com/ [2] W. Wang, Powermeeting: GWT-Based Synchronous Groupware, in HT 08: Proc. of 19th ACM Conf. on Hypertext and Hypermedia. New York, NY, USA: ACM, 2008, pp. 251252. [3] (2011) Flex Open-Source Framework. Adobe Systems Inc. [Accessed: March 2011]. [Online]. Available: http://www.adobe.com/products/ex/ [4] (2011) Real-Time Messaging Protocol (RTMP) Specication. Adobe Systems Inc. [Accessed: March 2011]. [Online]. Available: http://www.adobe.com/devnet/rtmp.html [5] Y. Liu, P. Shafton, D. A. Shamma, and J. Yang, Zync: The Design of Synchronized Video Sharing, in DUX 07: Proc. of 2007 Conf. on Designing for User eXperiences. New York, NY, USA: ACM, 2007, pp. 18. [6] M. R. Thissen, J. M. Page, M. C. Bharathi, and T. L. Austin, Communi- cation Tools for Distributed Software Development Teams, in SIGMIS- CPR 07: Proc. of ACM SIGMIS CPR Conf. on Computer Personnel Research. New York, NY, USA: ACM, 2007, pp. 2835. [7] (2011) Red5. The Red5 Project. [Accessed: March 2011]. [Online]. Available: http://red5.org/ [8] (2009, June) Watch Together. [Accessed: March 2011]. [Online]. Available: http://www.watch-together.com/ [9] (2011) Facebook Developers. Facebook Inc. [Accessed: March 2011]. [Online]. Available: http://developers.facebook.com/ [10] (2011) Moodle.org: Open-Source Community-Based Tools for Learning. Moodle Trust. [Accessed: March 2011]. [Online]. Available: http://www.moodle.org/