Wednesday, March 02, 2005

Firefox Adblock infringement (Part 2) Firefox Adblock infringement (Part 2)

Stephen Middlebrook in cyberia-l suggests that : "What bugs me about a lot of this © analysis is that people dig down a little bit into the technology and then pronounce a conclusion. But they don't fully address the intricacies of the technology -- the analysis is often overly simplistic: ‘The data is temporarily stored in RAM so you've made a copy for © purposes.’ Or ‘viewing a web page with an ad blocker is a derivative work since you didn't view the whole thing.’"

He has a very good point. The more I did into the subject, the less it looks like a product like Adblock infringes the copyright of a web page developer.

So, as Mr. Middlebrook suggests, lets dig a little deeper.

The basis of Web pages is HyperText Markup Language (HTML). This is almost exclusively the language of the primary docments sent to your browser by Web servers for rendering for display on your screen. It is a text layout language that is supposed to provide guidance to the rendering machine about how to display the text embedded within it. I say "supposed to" because, as we shall see, there is a dynamic between your browser and HTML about how exactly the content is displayed.

HTML, as noted, is a layout language. It is composed of elements and text. An element is composed of a start tag, content, and an end tag. Tags are identified by being enclosed by "<" and ">" symbols. Most start tags are matched by end tags, which have a divide ("/") sign right after the "<". Right after the "<" for start tags, and the divide ("/") in the corresponding end tags, are the tag names, such as "EM" for emphasis. Thus, for example, the EM element has a start tag, <EM>, and an end tag, </EM>. The start and end tags surround the content of the EM element. The result is that the content within the EM element is emphasized. Thus the HTML element:
"</EM>This is emphasized text</EM>"

is rendered as:
This is emphasized text.

After the tag name in the start elements are often one or more attributes that apply to the element of the form:
"attribute-name=attribute-property"
.

HTML supports dozens of different element types, identified by their different tag names. One of the most obvious is the Anchor ("A") element, which provides the Hypertext links that are the basis for the World Wide Web (WWW or Web). Thus, if I wished to code a link to the Drudge Report, I could code:
"<A href='http://www.drudgereport.com/'>The Drudge Report</A>".
This would appear to you as
The Drudge Report.


Most of the dozens of HTML element types provide layout information guidance. But there are a couple that are especially relevant here. First, there are the "SCRIPT" elements. Scripts are segments of interpreted programming languages. By far, the most common is Javascript, but others, such as Java and VBScript are also typically supported. (do not confuse Javascript with Java – Javascript has intentionally been cripled for security reasons as to what it can do – in particular, with the exception of Cookies, it cannot write to your disk.)

Scripts are the major way in which HTML is dynamic. It is done is two ways. First, scripts generate HTML dynamically. Indeed, being a programmer, I use this feature routinely, for example, generating the HTML for large tables dynamically. Secondly, scripts interact with the Document Object Model (DOM), a set of dynamic objects that represent the HTML of the document(s) being displayed. Each element in the HTML will have a corresponding DOM object, composed of a number of properties (i.e. variables), arrays, collections (like arrays), and methods (i.e. script functions or procedures). At the highest level are WINDOW objects representing the windows (and frames) currently active in your browser. Right under the WINDOW objects are DOCUMENT objects representing the actual HTML documents. Note that WINDOW and DOCUMENT objects provide the properties and methods that provide many of the features that so annoy Web readers, such as pop-up windows.

The second type of element that is relevant here are those that bring in remote images and the like. The most notable is the "IMG" element which specifies an inline image. The "SRC" attribute provides the URI (the superset of references that includes URLs) of the image to be loaded. Different types of images are supported, most notably .gif images. In any case, the IMG element causes the corresponding image (typically a graphic image) to be displayed. Also relevant here, for similar reasons, are "OBJECT" and "IFRAME" elements that also can be utilized to display images, play movies, etc. In all these cases though, the remote image, movie, sound, is identified by its URI/URL through the corresponding elements’ "SRC" (IMG and IFRAME) or "DATA" (OBJECT) attributes.

But what must be remembered here is that HTML only provides rendering agents, such as Internet Explorer or Gecko (Mozilla, Netscape, Firefox) guidance as to how to display the HTML received from a Web server (or generated dynamically by a script). This is because rendering agents (and their browsers) attempt to match a user’s circumstances to the HTML being rendered. For example, different computers have different screens set at different resolutions. Also, some browsers are text only – and don’t support the display of inline images.

Also note that browsers provide a fair amount of control to users in how to display HTML through, in particular, preferences. Interestingly, IE and the Mozilla (Gecko) family provide somewhat different controls. For example, in Mozilla, you can specify your base fonts for Serif and Sans-Serif font families, as well as the base font size. Most of the rest of font display is then based on this. For example, BIG, SMALL, headings, etc. are typically a certain percentage bigger or smaller than the base font sizes. Thus, when you increase the base font size, you correspondingly increase all the other font controls. Web developers can try to override this by specifying actual font sizes for various elements, but then the browsers can be configured to ignore (or override) this.

Similarly, browsers can (and do) control which types of scripts can run, and what they can do. For example, Gecko allows pop-ups to be turned off (the version of IE I am running does not – which is why I run Mozilla, et al.). Simiarly, I have configured such to run Javascripts, but not Java. And you can configure Mozilla whether or not Javascript can: Move or resize windows; Raise or lower windows; Hide the status bar; Change the status bar text; Change images; or Disable or replace context menus.

As for images, Mozilla allows the user to select whether to: not accept any images; to accept images only from the originating server; or to accept all images. It also provides for a user to block all images from a particular site – in otherwords, a Black List of servers from which images will not be displayed. It should be noted that this functionality is provided for a number of reasons, one in particular is that images, esp., for example, movies, take a lot of bandwidth to download, and over a dial-up connection, the corresponding Web pages take all that much longer to download and display. Thus, when I do run dial-up (when, for example, I am traveling), I disable most images. The savings in download time can be significant.

So, you can think of the interaction between Web developers and browser users as this: The Web developers suggest (sometimes quite strongly) how they want their Web pages to be displayed. The browsers then use this as guidance in the display of the Web pages. But note, it is only guidance. As noted, the design of HTML was done this way for a reason – that the individual browser users were in a much better position to decide what works best in their circumstances than the Web developers. So, HTML was designed to provide guidance.

Now we get to Firefox’s Adblock. In view of the above, what it does is fairly simple. One of the problems with Mozilla’s Image Manager is that its Black List does not provide for wild card characters. Thus, blocking "T1.Ads.com" does not block "T2.Ads.com". Adblock expands on this, providing wild card characters. Thus, using Adblock, you can now block either by "*.Ads.com" or "T*.ads.com". Secondly, instead of having to supply the URL of the site whose images are to be blocked, you can now also utilize regular expressions. This is potentially a much more powerful tool, because it allows you, for example, to prevent ads from any site with a host name that includes the string "Ad". BUT, regular expressions are fairly hard, esp. for the non-computer geek, to utilize. I think most of us who have used them have spent hours debugging such, to find out that a single character was misplaced. As a result, I suspect that their use will be de minimis.

The result, as is obvious I think, is that Adblock provides a very minor improvement to the control provided to browser users in the display of remote images.

This entire discussion started as a question as to whether utilizing Adblock resulted in a derivative work, and thus potentially infringed the copyright of the Web developers who depend, at least to some extent, on revenues from advertisements potentially being blocked from display.

In answering this question, first note that Adblock, and the like, do not, in any way, modify the HTML generated by Web developers and downloaded to browsers for rendering. So, no derivative work is being created at that level.

But our analysis cannot end at that point, because copyright extends to the non-literal aspects of a work, and in the case of software, for example, to what is displayed. Similarly, copyright extends to some extent to the display of HTML pages. Thus, it is possible to infringe the copyright of a HTML page by reproducing its output, even if the actual HTML being utilized is different. In other words, you cannot get around the copyright on the HTML by simply duplicating the output with other HTML.

But here, the copyright question is not of different HTML generating the same display, but rather of the same HTML being rendered differently. And HTML by its very design is only guidance to browsers as to how to render a page. Thus, a rendering of a HTML document in which certain images are not displayed is not a derivative work simply because HTML only provides guidance to a browser, and the browser (and thus, the user) has ultimate responsibility in how the HTML is displayed the corresponding computer.

Labels: ,

9:18 AM Display: Full / Chopped / Footer

Display: Full / Chopped / None

Display: Full / Footer / None

Display: Chopped / Footer / None

2 Comments:

Blogger Unknown said...

Nice blog. Have you seen your google rating? BlogFlux It's Free and you can add a Little Script to your site that will tell everyone your ranking. I think yours was a 3. I guess you'll have to check it out.

Computer News
Yahoo Boasts Size of Its Search Engine Index



Trying not to include any phallic analogies, Yahoo this week announced that its overall search engine index is much larger than Google’s and is the most in depth index of ‘web objects’ on the search market. On the Yahoo Search Blog, Yahoo disclosed that its index now includes 19.2 billion web documents, 1.6 billion images and more than 50 million audio and video files - over 20 billion items.

Yahoo is usually shy about disclosing the size of its search index, but the Yahoo Search Blog is celebrating its first year anniversary and Tim Mayer thought that somewhat of a retrospect was in order - since Yahoo has grown into its own as a search engine powerhouse over the past 365 days.

From the YSearchBlog : While we typically don’t disclose size (since we’ve always said that size is only one dimension of the quality of a search engine), for those who are curious this update includes just over 19.2 billion web documents, 1.6 billion images, and over 50 million audio and video files.

Note that as with all index updates we are still tuning things so you’ll continue to see some fluctuation in ranking over the next few weeks.

Greg Sterling of the Kelsey Group, however, makes the distinction of quality over quantity What I, Joseph User, care about is accuracy, quality and relevance. The available index does matter in terms of bringing me a sufficient quantity of results. (And if I’m looking for something really obscure, having that thing in the index is obviously important, which may go to size.).

But there’s a major case of diminishing returns—there’s already way too much information online for people to assimilate. Throwing more volume at me does nothing but make my eyes glaze over. What I want is enough relevant results.�

Index schmindex, the moral of the story is what Yahoo has accomplished over the past year and what the next 12 months will bring with not only Yahoo Search, but the Yahoo Publishers Network, Yahoo LinkSpots, Yahoo Pay Per Call, and Site Explorer. What has Yahoo accomplished over the past year? Well, here’s Tim’s rundown :
Copyright © - 2005 Entireweb

=============================================
Audio & Video

8:49 AM  
Anonymous Anonymous said...

Nicely researched and well thought out. Shame all these stupid bots were the only ones I saw in the comments section.

Anyway, very informative.

12:54 PM  

Post a Comment

<< Home >>