Find the names of students not enrolled in any class. Answer 5. Write the following queries in SQL: 1. Find the pnames of parts for which there is some supplier. Find the snames of suppliers who supply every part. Find the snames of suppliers who supply every red part.
Find the pnames of parts supplied by Acme Widget Suppliers and no one else. Find the sids of suppliers who charge more for some part than the average cost of that part averaged over all the suppliers who supply that part. Find the sids of suppliers who supply only red parts. Find the sids of suppliers who supply a red part and a green part. Find the sids of suppliers who supply a red part or a green part.
For every supplier that only supplies green parts, print the name of the supplier and the total number of parts that she supplies. For every supplier that supplies a green part and a red part, print the name and price of the most expensive part that she supplies.
Write each of the following queries in SQL. Additional queries using the same schema are listed in the exercises for Chapter 4. Find the names of pilots whose salary is less than the price of the cheapest route from Los Angeles to Honolulu. Find the aids of all aircraft that can be used on routes from Los Angeles to Chicago.
List the choice of departure times from Madison if the customer wants to arrive in New York by 6 p. Print the name and salary of every nonpilot whose salary is more than the average salary for pilots. Emp eid: integer, ename: string, age: integer, salary: real Works eid: integer, did: integer, pct time: integer Dept did: integer, dname: string, budget: real, managerid: integer Write the following queries in SQL: 1.
Print the names and ages of each employee who works in both the Hardware department and the Software department. For each department with more than 20 full-time-equivalent employees i. Print the name of each employee whose salary exceeds the budget of all of the departments that he or she works in. Find the enames of managers who manage the departments with the largest bud- gets. If a manager manages more than one department, he or she controls the sum of all the budgets for those departments.
Find the managerids of managers who control the largest amounts. If you divide the sum just computed by the count, would the result be the same as the average? The following two SQL queries attempt to obtain the answer to this question.
Do they both compute the result? If not, explain why. Under what conditions would they compute the same result? Consider the instance of Sailors shown in Figure 5. Explain the term impedance mismatch in the context of embedding SQL com- mands in a host language such as C. How can the value of a host language variable be passed to an embedded SQL command? Explain the need for cursors. Give an example of a situation that calls for the use of embedded SQL; that is, in- teractive use of SQL commands is not enough, and some host language capabilities are needed.
Write a C program with embedded SQL commands to address your example in the previous answer. Explain how you would write a C program to compute the transitive closure of a graph, represented as an SQL relation Edges from, to , using embedded SQL commands.
You need not write the program, just explain the main points to be dealt with. Explain the following terms with respect to cursors: updatability, sensitivity, and scrollability.
Compare this assertion with the equivalent table constraint. Explain which is better. Write SQL statements to delete all information about employees whose salaries exceed that of the manager of one or more departments that they work in. The limitation of the latter approach is that the condition is checked only when the Dept relation is being updated.
However, since age is an attribute of the Emp relation, it is possible to update the age of a manager which violates the constraint. So the former approach is better since it checks for potential violation of the assertion whenever one of the relations is updated. Write the SQL statements required to create these relations, including appropriate versions of all primary and foreign key integrity constraints.
Express each of the following integrity constraints in SQL unless it is implied by the primary and foreign key constraint; if so, explain how it is implied. If the constraint cannot be expressed in SQL, say so. SQL: Queries, Constraints, Triggers 61 l The number of distinct courses in which CS majors are enrolled is greater than the number of distinct courses in which Math majors are enrolled.
Con- trast triggers with other integrity constraints supported by SQL. The advantages of the trigger mechanism include the ability to perform an action based on the result of a query condition. The set of actions that can be taken is a superset of the actions that integrity constraints can take i. Triggers can also be executed before or after a change is made to the database that is, use old or new data.
There are also disadvantages to triggers. Integrity constraints are often easier to understand than triggers. Emp eid: integer, ename: string, age: integer, salary: real Works eid: integer, did: integer, pct time: integer Dept did: integer, budget: real, managerid: integer Write SQL integrity constraints domain, key, foreign key, or CHECK constraints; or assertions or SQL triggers to ensure each of the following requirements, consid- ered independently. Every manager must be also be an employee.
A manager must always have a higher salary than any employee that he or she manages. Why do they both exist? Explain the term stored procedure, and give examples why stored procedures are useful. Answer 6. A cursor enables individual row access of a relation by positioning itself at a row and reading its contents.
A stored procedure is program that runs on the database server and can be called with a single SQL statement. These SQL statements are static in nature and thus are preprocessed and precompiled. For instance, syntax checking and schema checking are done at compile time. JDBC allows dynamic queries that are checked at runtime. For dynamic queries, JDBC must still be used.
Stored procedures are programs that run on the database server and can be called with a single SQL statement. They are useful in situations where the processing should be done on the server side rather than the client side. Stored procedures can also be used to reduce network communication; the results of a stored procedure can be analyzed and kept on the database server.
Connect to a data source. Start, commit, and abort transactions. Call a stored procedure. How are these steps performed in SQLJ? Each connection can specify how to handle transactions.
Exercise 6. If an error has occurred, program control is transferred to a separate statement. This is done during the precompilation step for static queries. The SQLWarning class is used for problems not as severe as errors.
They are not caught in the try Why do we not need a precompiler for JDBC? Whenever a page is requested, is it dynamically assembled from static data and data in a database, resulting in a database access. Connecting to the database is usually a time-consuming process, since resources need to be allocated, and the user needs to be authenticated.
Since servlets can keep information beyond single requests, we can create a connection pool, and allocate resources from it to new requests. Obtain an open connection from the pool. Release a connection to the pool. Destroy the pool and close all connections. What is CGI? Why was CGI introduced? What are the disadvantages of an architecture using CGI scripts? What funcionality do typical application servers provide? When is an XML document well-formed?
When is an XML document valid? Answer 7. Java is cross-platform interpreted programming language. Servlets are pieces of Java code that run on the middle tier or server layers and be used for any functionality that Java provides. Cookies are a simple way to store persistent data at the client level. Each page request will create a new process on the server, which is a performance issue when requests are scaled up.
Application servers are used to maintain a pool of processes for handling requests. Typically, they are the middleware tier between the web server and the data sources such as database systems. Application servers eliminate the problems with process- creation overload and can also provide extra functionality like abstracting away heterogeneous data sources and maintaining session state information.
An XML document is well-formed if it follows three guide- lines: 1 it starts with an XML declaration, 2 it contains a root element that contains all other elements and 3 all elements are properly nested. Exercise 7. What is a communication protocol? What is the structure of an HTTP request message? What is the structure of an HTTP response message? What is a stateless protocol?
Why was HTTP designed to be stateless? Show the HTTP response message that the server generates for that page. Write a set of JSP pages that displays a shopping basket of items and allows users to add, remove, and change the quantity of items. To do this, use a cookie storage scheme that stores the following information: The UserId of the user who owns the shopping basket.
The number of products stored in the shopping basket. Experiment with cookies using JSP and make sure you know how to retrieve, set values, and delete the cookie. It has a link that directs the user to the Products page so they can start shopping. Products Page products. Each listed product should have a button next to it, which adds it to the shopping basket. If the item is already in the shopping basket, it increments the quantity by one. There should also be a counter to show the total number of items currently in the shopping basket.
The page also contains a button that directs the user to the Cart page. Cart Page cart. The listing for each item should include the product name, price, a text box for the quantity the user can change the quantity of items here , and a button to remove the item from the shopping basket.
Confirm Page confirm. There are two buttons on this page. One button cancels the order and the other submits the completed order. The cancel button just deletes the cookie and returns the user to the Index page.
The submit button updates the database with the new order, deletes the cookie, and returns the user to the Index page. This page allows users to search products by name or description. There should be both a text box for the search text and radio buttons to allow the user to choose between search-by-name and search-by-description as well as a submit button to retrieve the results. The page that handles search results should be modeled after products.
It should retrieve all records where the search text is a substring of the name or description as chosen by the user. To integrate this with the previous exercise, simply replace all the links to products. Internet Applications 69 Exercise 7. We say a user is authenticated if she has pro- vided a valid username-password combination to the system; otherwise, we say the user is not authenticated.
Assume for simplicity that you have a database schema that stores only a customer id and a password: Passwords cid: integer, username: string, password: string 1. Design a page that allows a registered user to log on to the system. Design a page header that checks whether the user visiting this page is logged in. In a human interaction study, it found that modem users typically like to view 20 search results at a time, and it would like to program this logic into the system.
Queries that return batches of sorted results are called top N queries. See Section 23 for a discussion of database support for top N queries. For example, results are returned, then results , then , and so on. Infrastructure: Create a database with a table called Books and populate it with some books, using the format that follows. When the new query results are returned, you can iterate to the placeholders and return the previous or next 20 results.
There should be a link where appropriate to get the previous 20 results or the next 20 results. To do this, you can encode the placeholders in the Previous or Next Links as follows.
Assume that you are displaying records 21— Then the previous link is display. You should not display a previous link when there are no previous results; nor should you show a Next link if there are no more results. When your page is called again to get another batch of results, you can perform the same query to get all the records, iterate through the result set until you are at the proper starting point, then display 20 more results.
What are the advantages and disadvantages of this technique? Query Constraints Technique: A second technique for performing top N queries is to push boundary constraints into the query in the WHERE clause so that the query returns only results that have not yet been displayed.
Although this changes the query, fewer results are returned and it saves the cost of iterating up to the boundary. For example, consider the following table, sorted by title, primary key. Using the placeholder technique, all 15 results would be returned for each batch. Using the constraint technique, batch 1 displays results but returns results , batch 2 will display results but returns only results , and batch 3 will display results but return only results The constraint can be pushed into the query because of the sorting of this table.
Title, B. This returns record 6. The page should include Previous and Next buttons to show the previous or next record set if there is one. Use the constraint query to get the Previous and Next record sets.
Why does a DBMS store data on external storage? What is a record id? What is the role of the disk space manager? Answer 8. A DBMS stores data on external storage because the quantity of data is vast, and must persist across program executions.
An rid has the property that we can identify the disk address of the page containing the record by using the rid. The disk space manager manages the available physical storage space of data for the DBMS. Exercise 8. What is a search key for an index? Why do we need indexes? What alternatives are available for the data entries in an index? What is a duplicate data entry in an index? Can a primary index contain duplicates? Are all of them suitable for secondary indexes? Each page can store up to three data records; so the fourth tuple is on page 2.
Explain what the data entries in each of the following indexes contain. If such an index cannot be constructed, say so and explain why. Overview of Storage and Indexing 75 sid name login age gpa Madayan madayan music 11 1. An unclustered index on age using Alternative 1. An unclustered index on age using Alternative 2.
An unclustered index on age using Alternative 3. A clustered index on age using Alternative 1. A clustered index on age using Alternative 2. A clustered index on age using Alternative 3.
An unclustered index on gpa using Alternative 1. An unclustered index on gpa using Alternative 2. An unclustered index on gpa using Alternative 3.
A clustered index on gpa using Alternative 1. A clustered index on gpa using Alternative 2. A clustered index on gpa using Alternative 3. In particular, discuss how equality and range searches work, using an example. The buckets are often constructed such that there are more bucket locations than there are possible search key values, and the hashing function is chosen so that it is not often that two search key values hash to the same bucket. If two search values hash to the same bucket, called a collision, a linked list is formed connecting multiple records in a single bucket.
In the case that too many of these collisions occur, the number of buckets is increased. Hash indexes are especially good at equality searches because they allow a record look up very quickly with an average cost of 1. Assume we have the employee relation with primary key eid and 10, records total. For range queries, hash indexes perform terribly since they could conceivably read as many pages as there are records since the data is not sorted in any clear grouping or set.
Assume we have the employees example again with 10, records and 10 records per page. It is not clear with a hash index how we even go about searching for every possible number greater than 20, since decimals could be used. It helps to have the index clustered whenever possible. Discuss: 1. The choice of primary index.
Clustered versus unclustered indexes. Hash versus tree indexes. Choice of search key for the index. What is a composite search key, and what considerations are made in choosing composite search keys? The choice of the primary key is made based on the semantics of the data. If we need to retrieve records based on the value of the primary key, as is likely, we should build an index using this as the search key.
Further, a clustered index is typically more expensive to maintain than an unclus- tered index. Therefore, we should make an index be clustered only if range queries are important on its search key. A composite search key can support a broader range as well as increase the possibility for an index- only plan, but are more costly to maintain and store. An index-only plan is query evaluation plan where we only need to access the indexes for the data records, and not the data records themselves, in order to answer the query.
Obviously, index- only plans are much faster than regular plans since it does not require reading of the data records. What is the cost if the condition is not on a key? Perform inserts and scans, where the order of records does not matter. Overview of Storage and Indexing 79 2. How would you use the indexes to enforce the constraint that eid is a key? Can you give an example of an update that is neither speeded up nor slowed down by the indexes? You can assume uniform distributions of values.
For each of the following queries, which of the listed index choices would you choose to speed up the query? If your database system does not consider index-only plans i.
Query: Print ename, age, and sal for all employees. Note that this plan, which is the best for this query, is not an index-only plan must look up dids. Answer 9. We must essentially step through all pages in order. Disks support direct access to a desired page. Exercise 9. On average, main memory accesses are faster, of course. It depends on the location of the data. Accessing to some data might be much faster than to others. The time to access memory is uniform for most computer systems.
What is the capacity of a track in bytes? What is the capacity of each surface? What is the capacity of the disk? How many cylinders does the disk have? Give examples of valid block sizes. Is bytes a valid block size? If the disk platters rotate at rpm revolutions per minute , what is the maximum rotational delay? If one track of data can be transferred per revolution, what is the transfer rate? The number of cylinders is the same as the number of tracks on each platter, which is The block size should be a multiple of the sector size.
We can see that is not a valid block size while is. The average rotational delay is half of the rotation time, 0. The capacity of a track is 25K bytes. How many records of bytes each can be stored using this disk? Storing Data: Disks and Files 83 4. If pages are stored sequentially on disk, with page 1 on block 1 of track 1, what page is stored on block 1 of track 1 on the next disk surface? How would your answer change if the disk were capable of reading and writing from all heads in parallel?
To read a record, the block containing the record has to be fetched from disk. Assume that each block request incurs the average seek time and rotational delay. What happens if the requested page is in the pool but not pinned?
If the page is in the pool, skip to step 2. If the page is not in the pool, it is brought in as follows: a A frame is chosen for replacement, using the replacement policy.
The requested page is pinned the pin count of the chosen frame is incremented and its address is returned to the requester. Pinning a page prevents it from being removed from the pool. Who is responsible for pinning pages? Who is responsible for unpinning pages? Pinning a page means the pin count of its frame is incremented.
Pinning a page to prevent it from being replaced. Ability to explicitly force a single page to disk. Why is it important? Storing Data: Disks and Files 85 Exercise 9. The rationale for this technique is the em- pirical observation that, if a disk page is requested by some not necessarily database!
So the disk gambles by reading ahead. Give a nontechnical reason that a DBMS may not want to rely on prefetching controlled by the disk. Ex- plain. Does this technique help, with respect to the preceding problem? In the worst case, it cause the cache miss on every page request, even with disk pre-fetching. It handles the deletion by using bitmaps or linked lists. Because of this indi- rection, deletion is easy.
One approach to managing the slot directory is to use a maximum size i. Discuss the pros and cons of this approach with respect to the approach discussed in the text. Which organization would you choose if records are variable in length? The directory organization is therefore better, especially with variable length records. A page format with previous and next page pointers would help in both cases. Obviously, such a page format allows us to build the linked list organization; it is also useful for implementing the directory in the directory organization.
Storing Data: Disks and Files 87 Exercise 9. In contrast, the list-based organization discussed in the text maintains a list of full pages and a list of pages with free space. Is one of them clearly superior? For each of these organizations, describe a suitable page format. Since the rotation speed is constant, the sequential data transfer rate is also higher on the outer tracks. The seek time and rotational delay are unchanged. Sequential speed is most important and outer tracks maximize it.
Show the tree that would result from inserting a data entry with key 9 into this tree. How many page reads and page writes does the insertion require?
Answer The data entry with key 9 is inserted on the second leaf page. The lowest data entry of the new leaf is given up to the ancestor which also splits. The insertion will require 5 page writes, 4 page reads and allocation of 2 new pages. The data entry with key 8 is deleted, resulting in a leaf page N with less than two data entries. The left sibling L is checked for redistribution. As is part 3, the data entry with key 8 is deleted from the leaf page N.
Therefore the two siblings merge. The key in the ancestor which distin- guished between the newly merged leaves is deleted. The data entry with key 46 can be inserted without any structural changes in the tree. But the removal of the data entry with key 52 causes its leaf page L to merge with a sibling we chose the right sibling. This results in the removal of a key in the ancestor A of L and thereby lowering the number of keys on A below the minimum number of keys.
Since the left sibling B of A has more than the minimum number of keys, redistribution between A and B takes place. Deleting the data entry with key 91 causes a scenario similar to part 5. The data entry with key 59 can be inserted without any structural changes in the tree. Therefore deleting the data entry with key 91 changes the tree in a way very similar to part 6.
Exercise Answer the following questions. Name a search key value such that inserting it into the original tree would cause an increase in the height of the tree. Nonetheless, what can you infer about the contents and the shape of these trees? How would your answers to the preceding questions change if this were an ISAM index? Suppose that this is an ISAM index.
What is the minimum space utilization for an ISAM index? Therefore the space utilization of ISAM index pages is usually close to percent by design. Show your structure before and after the insertion. Show your structure before and after the deletion. Fur- ther, every bucket has space for exactly one more entry. Tree-Structured Indexing 95 b Inserting the entries in the order shown and then deleting them in the op- posite order e.
What is the minimum number of insertions of data entries with distinct keys that will cause the height of the original tree to change from its current value of 1 to 3? The answer to each part is given below. Inserting 17 and 18 will cause the tree to split and gain a level. Inserting 13, 15, and 25 does change the tree structure any further, so deleting them in reverse order causes no structure change. When 18 is deleted, redistribution will be possible from an adjacent node since one node will contain only the value 17, and its right neighbor will contain 19, 20, and Finally, when 17 is deleted, no redistribution will be possible so the tree will loose a level and will return to the original tree.
When 4 is inserted, the right most leave will split causing the tree to gain a level. When it is deleted, the tree will not shrink in size. Let us call the current tree depicted in Figure T has 16 data entries. S has 18 leaf pages with two data entries each and one leaf page with three data entries. The argument in part 2 does not assume anything about the data entries to be inserted; it is valid if duplicates can be inserted as well.
Therefore the solution does not change. What performance and storage utilization problems are there with this approach? Explain how the bulk-loading algorithm described in the text improves upon this scheme. This approach is likely to be quite expensive, since each entry requires us to start from the root and go down to the appropriate leaf page. Also, according to the insertion algorithm, each time a node splits, the data entries are redistributed evenly to both nodes.
The bulk loading algorithm has good performance and space utilization compared with the repeated inserts approach. This allows good performance for future inserts, and supports some desired space utilization. Pointers i. Chapter 11 - Solutions. Chapter 12 - Solutions. Chapter 13 - Solutions. Chapter 14 - Solutions. Chapter 15 - Solutions.
Chapter 16 - Solutions. Chapter 17 - Solutions. Chapter 18 - Solutions. Chapter 19 - Solutions. Chapter 2 - Solutions. Chapter 20 - Solutions. Chapter 21 - Solutions. Chapter 22 - Solutions. Chapter 23 - Solutions. Chapter 24 - Solutions. Reduced application development time. Since the DBMS provides several important functions required by applications, such as concurrency control and crash recovery, high level query facilities, etc. Even this is facilitated by suites of application development tools available from vendors for many database management systems.
Data integrity and security. The view mechanism and the authorization facilities of a DBMS provide a powerful access control mechanism. Further, updates to the data that violate the semantics of the data can be detected and rejected by the DBMS if users specify the appropriate integrity constraints.
Data administration. By providing a common umbrella for a large collection of data that is shared by several users, a DBMS facilitates maintenance and data administration tasks.
A good DBA can effectively shield end-users from the chores of fine-tuning the data representation, periodic back-ups etc. Concurrent access and crash recovery. Users can write transactions as if their programs were running in isolation against the database.
The DBMS executes the actions of transactions in an interleaved fashion to obtain good performance, but schedules them in such a way as to ensure that conflicting operations are not permitted to proceed concurrently. Further, the DBMS maintains a continuous log of the changes to the data, and if there is a system crash, it can restore the database to a transaction-consistent state. That is, the actions of incomplete transactions are undone, so that the database state reflects only the actions of completed transactions.
Thus, if each complete transaction, executing alone, maintains the consistency criteria, then the database state after recovery from a crash is consistent. Exercise 1. Answer Exercise 1. Answer Logical data independence means that users are shielded from changes in the logical structure of the data, while physical data independence insulates users from changes in the physical storage of the data.
This is what is meant by physical data independence Exercise 1. Answer The DBA is responsible for: Designing the logical and physical schemas, as well as widely-used portions of the external schema. Security and authorization. Data availability and recovery from failures. Database tuning: The DBA is responsible for evolving the database, in particular the conceptual and physical schemas, to ensure adequate performance as user requirements change.
A security facility. Concurrency control. Crash recovery. A view mechanism. A query language. The data definition language. The data manipulation language. The buffer manager. The data model. Answer Let us discuss the choices in turn.
0コメント