You are on page 1of 14

Web Engineering Lecture One

On Web Engineering Software Engg vs Web Engg Web technologies: hypertext, hypermedia, client/server, etc Search engines: searching, indexing, crawlers, etc Search Engine Optimization Web matrices and quality Web engineering Systematic, scientific, engineering and management approach Develop, deploy and maintain qualitative Web applications focuses on sound methodologies, techniques, and tools for developing web apps Web engineering focuses on methodologies, techniques or tools for developing web apps. Web engineering is defined as ...the use of scientific, engineering, and management principles and systematic approaches with the aim of successfully developing, deploying and maintaining high quality Web-based systems and applications... Web development has an important artistic side. Web apps Vs traditional software devt/IS/computer application devt? Characteristics of Web apps Web apps constantly evolve. Unlike conventional software that goes through a planned and discrete revision at specific times in its lifecycle, Web applications continuously evolve in terms of their requirements and functionality (instability of requirements). Managing the change and evolution of a Web application is a major technical, organizational and management challenge much more demanding than a traditional software development. Web apps are inherently different from software. The content, which may include text, graphics, images, audio, and/or video, is integrated with procedural processing. Also, the way in which the content is presented and organized has implications on the performance and response time of the system. Web applications are meant to be used by a vast, variable user community - a large number of anonymous users with varying requirements, expectations, and skill sets. Therefore, the user interface and usability features have to meet the needs of a diverse, anonymous user community to whom we cannot offer training sessions, thus complicating human-Web interaction (HWI), user interface, and information presentation. In general, many Web-based systems demand a good look and feel, favoring visual creativity and incorporation of multimedia in presentation and interface. In these systems, more emphasis is placed on visual creativity and presentation. Technology instability- new tools, technologies, languages, standards to cope with. Web apps devt uses cutting-edge, diverse technologies and standards and integrates numerous varied components, including traditional and non-traditional software, interpreted scripting languages, HTML files, databases, images, and other multimedia components such as video and audio, and complex user interfaces.

Delivery medium is different from traditional software. Security and privacy needs of Web-based systems are more demanding than that of traditional software. Web Apps vs Conventional software With respect to their development process, technologies, quality factors, and measures Web Hypermedia, Web Software, or Web Application? Hypermedia extension of hypertext The Web is the best known example of a hypermedia system. The Web has been used as the delivery platform for three types of applications: Web hypermedia applications, Web software applications, and Web applications Web hypermedia application a non-conventional application characterized by the authoring of information using nodes (chunks of information), links (relations between nodes), anchors, access structures (for navigation), and delivery over the Web . Technologies: HTML, XML, JavaScript, and multimedia. Web software application A conventional software application that relies on the Web or uses the Web's infrastructure for execution . Typical applications include legacy information systems such as databases, booking systems, e-commerce apps, etc They employ development technologies (e.g. DCOM, ActiveX, etc.), database systems, and development solutions (e.g. J2EE). Web application An application delivered over the Web that combines characteristics of both Web hypermedia and Web software applications. Web Development vs. Software Development Areas of difference for web devt and maintenance: People involved, intrinsic characteristics of web apps, and audience Differences between Web and software development divided into 12 areas application characteristics primary technologies used approach to quality delivered development process drivers availability of the application customers (users/stakeholders) update rate/maintenance cycles people involved in development architecture and network disciplines involved

legal, ethical and social issues information structuring and design Application Characteristics Primary Technologies Used Web apps use technologies such as Java solutions (JavaBeans, JSP, etc), HTML, XML, JavaScript, and databases. Software devt uses technologies such as OO languages or procedural, databases, generators, CASE tools. Approaches to quality delivered Web apps are expected to be high quality so that customers return to do repeat business. Usability, accessibility, graphic design become very important Competition is high over the users on the web popularity is important Development Process Drivers The dominant development process drivers for Web companies are composed of three quality criteria Reliability Usability Security With regards to conventional software development, the development process driver is time to market and not quality criteria Disciplines Involved wide range of skills and expertise is required for web apps Distinct disciplines such as software engineering (development methodologies, project management, tools), hypermedia engineering (linking, navigation), requirements engineering, usability engineering, information engineering, graphics design, and network management (performance measurement and tuning) for conventional software, smaller disciplines such as software engineering, requirements engineering, and usability engineering are required. Information Structuring and Design Web applications present structured and unstructured content, which may be distributed over multiple sites and use different systems (e.g. database systems, file systems, multimedia storage devices) the design of a Web application, unlike that of conventional software applications, includes the organisation of content into navigational structures by means of hyperlinks Suitable navigational structures

Technologies for Web Apps The choice of appropriate technologies is an important success factor in the development of Web applications. Markup/Hypertext/hypermedia/client-server/sockets Define WHAT of a system: Define the requirements of web apps, identify the architecture, develop a design, etc Define HOW: [implementation phase] choice of appropriate technologies Separation of content and presentation, is a central requirement to appropriately use technologies. The specifics of implementation technologies for Web applications versus conventional software systems stem from the use of Web standards. This concerns in particular the implementation within the three views: request (client), response (server), and the rules for the communication between these two (protocol). Protocol: HTTP, SMTP, FTP Client Technologies: HTML, Plug-ins, Java Applets, ActiveX Controls, Server Technologies: Markup instructions for document formatting. For example, we could write *Hello* to output Hello or /Hello/ to output Hello This is text inserted in a document to add information as to how characters and contents should be represented in the document. SGML HTML/XML Hypertext and Hypermedia Hypertext is understood as the organization of the interconnection of single information units. Relationships between these units can be expressed by links . Hypermedia is commonly seen as a way to extend the hypertext principle to arbitrary multimedia objects, e.g., images or video. Client/Server Communication on the Web The client/server paradigm underlying all Web applications forms the backbone between a user (client or user agent) and the actual application (server) 2-layer architecture SMTP, RTSP, SMTP Simple Mail Transfer Protocol SMTP combined with POP3 and IMAP allows us to send and receive e-mails In addition, SMTP is increasingly used as a transport protocol for asynchronous message exchange based on SOAP

RTSP Real Time Streaming Protocol A standard designed to support the delivery of multimedia data in real-time conditions. In contrast to HTTP, RTSP allows the transmission of resources to the client in a timely context rather than delivering them in their entirety (at once) . This transmission form is commonly called streaming Streaming allows us to manually shift the audiovisual time window by requesting the stream at a specific time, i.e., it lets us control the playback of continuous media. From Wiki
The transmission of streaming data itself is not a task of the RTSP protocol Most RTSP servers use the Real-time Transport Protocol (RTP) for media stream delivery While similar in some ways to HTTP, RTSP defines control sequences useful in controlling multimedia playback

HTTP HyperText Transfer Protocol Text-based stateless protocol controlling how resources, e.g., HTML documents or images, are accessed. Session Tracking Interactive Web Applications must be able to distinguish requests by multiple simultaneous users and identify related requests coming from the same user Session defines a sequence of related HTTP requests between a specific user and server within in a specific time window Since HTTP is a stateless protocol, the Web server cannot automatically allocate incoming requests to a session Two principal methods can be distinguished, to allow a Web server to automatically allocate an incoming request to a session: In each of its requests to a server, the client identifies itself with a unique identification. This means that all data sent to the server are then allocated to the respective session. All data exchanged between a client and a server are included in each request a client sends to a server, so that the server logic can be developed even though the communication is stateless. Session tracking is normally implemented by URL rewriting or cookies. Client Technologies Helpers and Plug-ins Adobe reader, WinZip Java Applets ActiveX Controls Document Specific Technologies HTML XML XSL/XSLT SVG Scalable Vector Graphics - Allows describing two-dimensional graphics in XML - SVG recognizes three types of graphics objects: vector graphics consisting of straight

lines and curves, images, and text - Supports event-based interaction, e.g., responses to buttons or mouse movements - This format is suitable for all types of interactive and animated vector graphics. - Application examples include the representation of CAD, maps, and routes. SMIL - Synchronized Multimedia Integration Language - Used to represent synchronized multimedia presentations . Server Side Technologies URI handlers to process HTTP requests Server Side Includes (SSI) CGI Server Side Scripting Servlets JSP ASP.NET Web Services Middleware Technologies Application Servers Messaging Systems/Brokers

Web Application Architectures The quality of a Web application is considerably influenced by its underlying architecture. Components of a Generic Web Application Architecture Components based on the request-response paradigm Components Client browser or user agent Firewall A piece of software regulating the communication between insecure networks (e.g., the Internet) and secure networks (e.g., corporate LANs). This communication is filtered by access rules. Proxy A proxy is typically used to temporarily store Web pages in a cache However, proxies can also assume other functionalities, e.g., adapting the contents for users (customization), or user tracking. A proxy is used as an intermediate server to forward client requests for URLs to the (actual) server. proxies are used to adapt and format links and contents to users Web Server A Web server is a piece of software that supports various Web protocols like HTTP, and HTTPS, etc., to process client requests. Database Server This server normally supplies an organizations production data in structured form, e.g., in tables Media Server This component is primarily used for content streaming of non-structured bulk data (e.g., audio or video) Content Management Server Similar to a database server, a content management server holds contents to serve an application. These contents are normally available in the form of semi-structured data, e.g., XML documents. Application Server

An application server holds the functionality required by several applications, e.g., workflow or customization. Legacy Application A legacy application is an older system that should be integrated as an internal or external component. Data Aspect Architectures Data can be grouped into either of three architectural categories: (1) structured data of the kind held in databases; (2) documents of the kind used in document management systems; and (3) multimedia data of the kind held in media servers. Architectures for Multimedia Data The ability to handle large data volumes plays a decisive role when designing systems that use multimedia contents Basically, multimedia data, i.e., audio and video, can be transmitted over standard Internet protocols like HTTP or FTP, just like any other data used in Web applications. This approach is used by a large number of current Web applications, because it has the major benefit that no additional components are needed on the server. Its downside, however, is often felt by users in that the media downloads are very slow. We can use streaming technologies to minimize these waiting times for multimedia contents to play out. Streaming in this context means that a client can begin playout of the audio and/or video a few seconds after it begins receiving the file from a server This technique avoids having to download the entire file (incurring a potentially long delay) before beginning playout Two protocols are generally used for the streaming of multimedia contents. One protocol handles the transmission of multimedia data on the network level, and the other protocol controls the presentation flow (e.g., starting and stopping a video) and the transmission of metadata. RTP [real time protocol] network protocol , RTSP [real time streaming protocol] control protocol, MMS [Microsoft media server]

Fig 2: Streaming media architecture using point-to-point connections.

Search Engines
Originally, the term search engine referred to some kind of search index, a huge database containing information from individual Web sites. Help people find information on the Internet/on other sites. Large search-index companies own thousands of computers that use software known as spiders or robots (or just plain bots) to grab Web pages and read the information stored in them . These systems dont always grab all the information on each page or all the pages in a Web site, but they grab a significant amount of information and use complex algorithms calculations based on complicated formulae to index that information General Operations of search engines: [Crawling, Indexing, Searching] Search/crawl the Internet Keep an index of the words they find, and where they find them words: occurring in the title, subtitile, metatags, and other relevant positions. Allow users to look for words or combinations of words found in that index Search/Crawl the Internet Search engine employs special software robots, called spiders, to build lists of the words found on Web sites The early Google system had a server dedicated to providing URLs to the spiders. Rather than depending on an Internet service provider for the domain name server (DNS) that translates a server's name into an address, Google had its own DNS, in order to keep delays to a minimum. When a spider is building its lists, the process is called Web crawling How does any spider start its travels over the Web? The usual starting points are lists of heavily used servers and very popular pages. The spider will begin with a popular site, indexing the words on its pages and following every link found within the site. The Google spider was built to index every significant word on a page, leaving out the articles "a," "an" and "the." Other spiders take different approaches. robot exclusion protocol: when a site's owner doesn't wish a spider to crawl its pages or links

Search Directory A search directory is a categorized collection of information about Web sites instead of containing information from Web pages. The most significant search directories are owned by Yahoo! (dir.yahoo.com) and the Open Directory Project (www.dmoz.org). Directory companies dont use spiders or bots to download and index pages on the Web sites in the directory; rather, for each Web site, the directory contains information, such as a title and description, submitted by the site owner. Directories are human-editable: People check your web site; people index your website etc. Google also has a directory but the information comes from somebody else from the Open Directory Project.

Building the Index Once the spiders have completed the task of finding information on web pages, the search engine must store it in a way that makes it useful. There are two key components involved in making the gathered data accessible to users: the information stored with the data the method by which the information is indexed. In the simplest case, a search engine could just store the word and the URL where it was found.

Page rank/Ranking organic and paid search results Search engines store more info that simple word/URL combinations. An engine might store the number of times that the word appears on a page. The engine might assign a weight to each entry, with increasing values assigned to words as they appear near the top of the document, in sub-headings, in links, in the meta tags or in the title of the page. Ranking list tries to present the most useful pages at the top. A search engine's organic ranking algorithm is one of the trickiest parts of designing a search engine, so let's start by examining the simplest kind of ranking algorithm. Ranking is just another word for sorting, the act of collating results into a certain order. Shopping search engines typically use simple ranking algorithms that the searcher can choose. When the searcher is looking for a product to buy, the shopping search engine might start by ordering the results by price (lowest to highest), but the searcher can decide to sort the list by other columns, such as availability (in stock, within one week, and so on), or any other features of the product. Term frequency, term placement, link popularity (link analysis) Regardless of the precise combination of additional pieces of information stored by a search engine, the data will be encoded to save storage space.

After the information is compacted, it's ready for indexing. An index has a single purpose: It allows information to be found as quickly as possible . There are quite a few ways for an index to be built, but one of the most effective ways is to build a hash table. In hashing, a formula is applied to attach a numerical value to each word. The formula is designed to evenly distribute the entries across a predetermined number of divisions. This numerical distribution is different from the distribution of words across the alphabet, and that is the key to a hash table's effectiveness. The hash table contains the hashed number along with a pointer to the actual data, which can be sorted in whichever way allows it to be stored most efficiently. The combination of efficient indexing and effective storage makes it possible to get results quickly, even when the user creates a complicated search.

Search and Display Results Searching through an index involves a user building a query and submitting it through the search engine. Displaying the results is a lot simpler than some other parts of the process display can contain organic or paid results. Organic results all use the title of the page followed by a snippet - a summary of the text from that page that contains the search terms. Paid results also use similar methods to display the pages

Search Relationships Search engines compete with each other, but they also collaborate Many search engines use technology from their competitors to present results. Understanding how each engine delivers its results helps you target the most effective search marketing efforts.

"Spiders" take a Web page's content and create key search words that enable online users to find pages they're looking for.

Search Engine Optimization


SEO is the process of improving the visibility of a website or a web page in search engines via
the "natural" or un-paid ("organic" or "algorithmic") search results.

Search engine marketing through paid listings


In general, the earlier (or higher on the page), and more frequently a site appears in the search results list, the more visitors it will receive from the search engine. search engines. The process of editing a web sites content and code in order to improve visibility within one or more search engines

The act of altering a web site so that it does well in the organic, crawler based listings of
White hat vs Black hat SEO

SEO techniques are classified by some into two broad categories: techniques that search

engines recommend as part of good design, and those techniques that search engines do not approve of and attempt to minimize the effect of, referred to as spamdexing. White hats are those website designers that play nice and try to follow all of the search engine guidelines to optimize their site A SEO tactic, technique or method is considered white hat if it conforms to the search engines' guidelines and involves no deception. White hat SEO is not just about following guidelines, but is about ensuring that the content a search engine indexes and subsequently ranks is the same content a user will see. White hat advice is generally summed up as creating content for users, not for search engines, and then making that content easily accessible to the spiders, rather than attempting to game the algorithm.

Black hats are where website designers use backdoors, cloaking/hiding, and other tricks to optimize sites. [keyword stuffing, hidden/invisible/unrelated, metatag stuffing, ] Black hat SEO attempts to improve rankings in ways that are disapproved of by the search
engines, or involve deception. One black hat technique uses text that is hidden, either as text colored similar to the background, in an invisible div, or positioned off screen. Search engines may penalize sites they discover using black hat methods, either by reducing their rankings or eliminating their listings from their databases altogether

You might also like