Professional Documents
Culture Documents
Differences between:
o The early forms of the web
o Web 2.0
o The semantic web
o Later developments
Possibilities and limitations associated with the evaluation of the web.
http://nostalgiacafe.proboards.com/thread/133/1990s-internet-world-wide-web
http://www.businessinsider.com/big-brands-90s-websites-look-terrible-2013-4?
op=1
= simple websites that distribute information
NETSCAPE NAVIGATOR was a proprietary web browser for the WEB 1.0 era.
WEB 1.0 = a library (you can use it as a source of information, but you cant
contribute to or change the information in any way).
WEB 2.0
Web 2.0 philosophy is creating a web page that visitors can impact or
change
E.g. 1 : AMAZON allows visitors to post product reviews. Future visitors will
have a chance to read them, which might influence their decision to buy the
product.
BUT, a RESTAURANT might have a webpage that shows the current menu. While
the menu might evolve over time, the webmaster wouldnt want visitors to be
able to make changes.
E.g. 2: WIKIPEDIA = online encyclopaedia resource that allows visitors to make
changes to most articles. Ideally, with enough people contributing to Wikipedia
entries, the most accurate and relevant information about every subject will
eventually be part of each article.
Unfortunately, because anyone can change entries, its possible for someone to
post false misleading information.
WEB 3.0 (SEMANTIC WEB)
E.g:
It will be your personal assistant: it will learn what you are interested in
(more you use the web, more it will record, less specific you will need to
be with your questions).
Question: Where should I go for lunch?
Browser:
- consult its records of what you like and dislike
- take into account your current location
- suggest a list of restaurants.
WEB 3.0 search engines could find not only keywords, in your search, but also
interpret the context of your request.
SEMANTIC WEB
= proposes to help computers read and use the WEB. METADATA added to web
pages can make the existing WWW machine readable.
METADATA = simply machine readable data that describes other data.
In semantic web, they are invisible as people read the page, but theyre visible
to computers.
HTTP:
Characteristics:
-
A web browser, for example may be the client and an application running on the
computer hosting a website may be the server.
-
HTTPS:
It is the result of layering the HTTP on top of the SSL (SECURE SOCKET
LAYER)/ TLS (TRANSPORT LAYER SECURITY) protocol, thus adding the
security capabilities of SSL/ TLS standard HTTP communications.
The security of HTTPS is therefore that of the underlying TLS, which uses
long-term public and secret keys to exchange a short term session key to
encrypt the data flow between client and server.
HTTPS provides AUTHENTICATION of the website and associated
webserver that one is communication with, to avoid man in the middle - attack (an attacker who has the ability to both monitor and alter
or inject messages into a communication channel)
HTTPS provides BIDIRECTIONAL ENCRYPTION of communications between
a client and a server.
HTML
A web browser can read HTML files and compose them into visible or
audible web pages.
The browser determines how the page should be displayed based on the tags.
(The tags guide the browser and a page can look different on different browsers).
XML
XSLT
= the format and relationships among XML tags are defined in a DOCUMENT
TYPE DEFINITION DOCUMENT. A set of XSLT define the way the content of an XML
document is turned into another format suitable for the current needs of a user.
JAVASCRIPT
CSS
= a style sheet language used for describing the look and formatting of a
document written in a markup language.
-
= defines all types of names and addresses that refer to objects on the world
wide web.
The ISP sends into to the computer what DNS server shall use.
= specifies the format of packets, also called datagrams and the addressing
scheme.
-
FEATURES OF TCP:
-
Metatags
Title
The banner (an area at the top of the page that is often the same on all
the pages)
The menu
The content area
Footer
Corner
Images
Headlines/ titles
Body content
Navigation
Credits
Display exactly the same information whenever anyone visits the site.
They can include text, video, images.
Are capable of producing different content for different visitors from the
same source of code file, based on what O.S the visitor uses, if he/she is
using a PC or a mobile device and the source that referred the visitor.
browser navigates to the resource indicated by the link's target URI, and the
process of bringing content to the user begins again.
C.1.13 Evaluate the use of client-side scripting and serverside scripting in web pages
There are two main ways to customise Web pages and make them more
interactive. The two are often used together because they do very different
things.
Scripts
A script is a set of instructions. For Web pages they are instructions either to the
Web browser (client-side scripting) or to the server (server-side scripting). These
are explained more below.
-
Scripts provide change to a Web page. Any page which changes each time
you visit it (or during a visit) probably uses scripting.
All log on systems, some menus, almost all photograph slideshows and many
other pages use scripts. Google uses scripts to fill in your search term for you, to
place advertisements, to find the thing you are searching for and so on. Amazon
uses scripting to list products and record what you have bought.
Client-side
The client is the system on which the Web browser is running. JavaScript is the
main client-side scripting language for the Web. Client-side scripts are
interpreted by the browser. The process with client-side scripting is:
-
Server-side
The server is where the Web page and other content is kept. The server sends
pages to the user/client on request. The process is:
-
The use of HTML forms or clever links allow data to be sent to the server and
processed. The results may come back as a second Web page.
Server-side scripting tends to be used for allowing users to have individual
accounts and providing data from databases. It allows a level of privacy,
personalisation and provision of information that is very powerful. E-commerce,
MMORPGs and social networking sites all rely heavily on server-side scripting.
PHP and ASP.net are the two main technologies for server-side scripting.
-
The script is interpreted by the server meaning that it will always work the
same way. Server-side scripts are never seen by the user (so they can't
copy your code). They run on the server and generate results which are
sent to the user. Running all these scripts puts a lot of load onto a server
but none on the user's system.
The combination
A site such as Google, Amazon, Facebook will use both types of scripting:
-
The common gateway interface (CGI) is a standard way for a Web server to pass
a Web user's request to an application program and to receive data back to
forward to the user.
1. The Web surfer fills out a form and clicks, Submit. The information in the
form is sent over the Internet to the Web server.
2. The Web server grabs the information from the form and passes it to the
CGI software. The CGI software performs whatever validation of this
information that is required. For instance, it might check to see if an e-mail
address is valid. If this is a database program, the CGI software prepares a
database statement to add, edit, or delete information from the database.
3. The CGI software then executes the prepared database statement, which
is passed to the database driver.
4. The database driver acts as a middleman and performs the requested
action on the database itself.
5. The results of the database action are then passed back to the database
driver.
6. The database driver sends the information from the database to the CGI
software.
7. The CGI software takes the information from the database and
manipulates it into
8. the format that is desired.
9. If any static HTML pages need to be created, the CGI program accesses
the Web server computers file system and reads, writes, and/or edits files.
10.The CGI software then sends the result it wants the Web surfers browser
to see back to the Web server.
11.The Web server sends the result it got from the CGI software back to the
Web surfers browser.
= that portion of World Wide Web content that is not indexed by standard search
engines.
The deep Web consists of data that you won't locate with a simple Google
search. No one really knows how big the deep Web really is, but it's hundreds (or
perhaps even thousands) of times bigger that the surface Web. This data isn't
necessarily hidden on purpose. It's just hard for current search engine
technology to find and make sense of it.
The surface Web consists of data that search engines can find and then offer up
in response to your queries.
But in the same way that only the tip of an iceberg is visible to observers, a
traditional search engine sees only a small amount of the information that's
available -- a measly 0.03 percent
PAGERANK
This is where it gets tricky. The PR of each page depends on the PR of the pages
pointing to it. But we wont know what PR those pages have until the pages
pointing to them have their PR calculated and so on And when you consider
that page links can form circles it seems impossible to do this calculation!
But actually its not that bad. Remember this bit of the Google paper:
PageRank or PR(A) can be calculated using a simple iterative algorithm,
and corresponds to the principal eigenvector of the normalized link matrix
of the web.
What that means to us is that we can just go ahead and calculate a pages PR
without knowing the final value of the PR of the other pages. That seems
strange but, basically, each time we run the calculation were getting a closer
estimate of the final value. So all we need to do is remember the each value we
calculate and repeat the calculations lots of times until the numbers stop
changing much.
http://www.cs.princeton.edu/~chazelle/courses/BIB/pagerank.htm
http://www.math.cornell.edu/~mec/Winter2009/RalucaRemus/Lecture4/lecture4.h
tml
= an algorithm that made use of the link structure of the web in order to
discover and rank pages relevant for a particular topic.
Suppose you want to search for the "best automobile makers in the last 4 years".
When you ask a search engine this question, it will count all occurrences of the
given words in a given set of documents. The results might be different than
what you expected. (think of a dictionary or a web page that repeats the phrase
"automobile maker = car manufacturer" one billion times. - this web page will be
the first displayed by the query engine).
So, there is needed a different ranking system in order to find those pages
that are authoritative for a given query. Page i is called an authority for the
query "automobile makers" if it contains valuable information on the subject.
(those are the ones truly relevant to the given query). For supporting the
system, there are defined an other category of web pages relevant to the
process of finding the authoritative pages, called HUBS. The hubs role is to
advertise the authoritative pages. They contain useful links towards the
authoritative pages. They point the search engine in the right direction.
For better understanding: the authoritative pages could be official sites of web
manufacturers: www.bmv.com and a hub is a blog where people discuss about
the cars they purchased or pages that contain rankings of the cars
(recommending the official web manufacturer).
creates a copy of every web page (for later indexing by the search engine)
that it visits
usually starts at a popular site
searches a page for links to other pages
follows these links and repeats process
initially looks for the file robots.txt for instructions on pages to ignore
(duplicate content, irrelevant pages)
is used to retrieve email addresses (for spam)
is used by webmaster for checking integrity of site
tags
inserted by web designer/owner
contain keywords and concepts (helps to clarify meaning)
description / title can be shown in the search results
noindex, nofollow in robots tag can instruct crawlers not to index pages
C.2.8. Suggest how web developers can create pages that appear more
prominently in search engine results.
- examine time taken, number of hits, quality of results.
- ensure the site has high-quality information
-get other indexed sites to link to your site
- encourage others to link to you
- identify the keywords for which you would like to be found
-place keywords in prime locations (headlines and section titles, link text, page
title metadata, page description metadata, page text, page URL)
- ensure a search-friendly web site architecture (ensure theres a simple link to
every page on your site, include content early in each HTML page, use standard
header tags, be careful of duplicate pages)
- keep your site fresh (updated)
http://www.idealware.org/articles/found_on_search_engines.php
C.2.8. Suggest how web developers can create pages that appear more
prominently in search engine results.
- students will be expected to test specific data in a range of search engines, for
example examining time taken, number of hits, quality of returns.
Today, the major search engines use many metrics to determine the value of external
links. Some of these metrics include:
The
The
The
The
The
The
The
page.
The ownership relationship between the source and target domains.
In addition to these metrics, external links are important for two main reasons:
1. Popularity
Whereas traffic is a "messy" metric and difficult for search engines to measure
accurately (according to Yahoo! search engineers), external links are both a more
stable metric and an easier metric to measure. This is because traffic numbers
are buried in private server logs while external links are publicly visible and
easily stored. For this reason and others, external links are a great metric for
determining the popularity of a given web page. This metric (which is roughly
similar to toolbar PageRank) is combined with relevancy metrics to determine the
best results for a given search query.
2. Relevancy
Links provide relevancy clues that are tremendously valuable for search engines.
The anchor text used in links is usually written by humans (who can interpret
web pages better than computers) and is usually highly reflective of the content
of the page being linked to. Many times this will be a short phrase (e.g. "best
aircraft article") or the URL of the target page (e.g. http://www.best-aircraftarticles.com).
The target and source pages and domains cited in a link also provide valuable relevancy
metrics for search engines. Links tend to point to related content. This helps search
engines establish knowledge hubs on the Internet that they can then use to validate the
importance of a given web document.
C.2.11. Discuss the use of white hat and black hat search
engine optimization.
White hat (links from C.2.8)
new sites send XML site map to Google
include a robots.txt file
add site to Googles Webmaster Tools to warn you if site is uncrawlable
make sure the HI tag contains your main keyword
page titles contain keywords
relevant keywords with each image
site has suitable keyword density (but no keyword stuffing)
Students should be able to explain the effect of the above techniques.
Black-hat
hidden content
keyword stuffing
link farms
etc
Black hat SEO refers to attempts to improve rankings in ways that are not
approved by search engines and involve deception. They go against current
search engine guidelines. White hat SEO refers to use of good practice methods
to achieve high search engine rankings. They comply with search engine
guidelines.
Black hat SEO is more frequently used by those who are looking for a quick
financial return on their Web site, rather than a long-term investment on their
Web site. Black hat SEO can possibly result in your Web site being banned from a
search engine, however since the focus is usually on quick high return business
models, most experts who use Black Hat SEO tactics consider being banned from
search engines a somewhat irrelevant risk.
In search engine optimization (SEO) terminology, white hat SEO refers to the
usage of optimization strategies, techniques and tactics that focus on a human
audience opposed to search engines and completely follows search engine rules
and policies.
For example, a website that is optimized for search engines, yet focuses on
relevancy and organic ranking is considered to be optimized using White Hat SEO
practices. Some examples of White Hat SEO techniques include using keywords
and keyword analysis, backlinking, link building to improve link popularity, and
writing content for human readers.
White Hat SEO is more frequently used by those who intend to make a long-term
investment on their website. Also called Ethical SEO.
Students should be able to assess both how the above function and their degree
of success.
Concept-based searching
Natural language queries (e.g. Ask.Jeeves.com)
Future challenges:
-
Error management
Lack of quality assurance of information uploaded
The search engines will need to evolve to remain effective as the web grows.
UBIQUITOUS COMPUTING:
PEER 2 PEER:
GRID COMPUTING:
Publicly available
Royalty-free
INTEROPERABILITY :
Social networking
Blogs
Wikis
Skype/Google Hangout
are:
Interaction
Collaboration
User-generated content
Virtual communities
Ajax allows JavaScript to upload/ download new data from the server (without
reloading the page)
XML formatting Document Object Model (DOM)
Flash playing video and audio
Copyright is a legal term used to describe the rights that creators have
over their literary and artistic works. Works covered by copyright range from
books, music, paintings, sculpture and films, to computer programs, databases,
advertisements, maps and technical drawings.
A patent is an exclusive right granted for an invention. Generally
speaking, a patent provides the patent owner with the right to decide how - or
whether - the invention can be used by others. In exchange for this right, the
patent owner makes technical information about the invention publicly available
in the published patent document.
A trademark is a sign capable of distinguishing the goods or services of
one enterprise from those of other enterprises. Trademarks date back to ancient
times when craftsmen used to put their signature or "mark" on their products.
http://www.digitalenterprise.org/ip/ip.html
http://www.wipo.int/about-ip/en/
SSL uses a cryptographic system that uses two keys to encrypt data a
public key known to everyone and a private or secret key known only to the
recipient of the message. Both Netscape Navigator and Internet Explorer support
SSL, and many Web sites use the protocol to obtain confidential user information,
such as credit card numbers. By convention, URLs that require an SSL connection
start with https: instead of http:.
Another protocol for transmitting data securely over the World Wide
Web is Secure HTTP (S-HTTP). Whereas SSL creates a secure connection between
a client and a server, over which any amount of data can be sent securely,
S-HTTP is designed to transmit individual messages securely.
http://www.theguardian.com/technology/2010/jan/24/internet-revolutionchanging-world
http://en.wikipedia.org/wiki/Decentralization