
Exercise: Given PROJECT(Pno, Pname, Budget, Location). Applications:
Solution:
Solution Strategy: Always start by identifying the primary key. For vertical, check that every attribute appears at least once. For horizontal, ensure predicates are complete and mutually exclusive.
Given:
Relation PROJECT(PID, Title, Budget, StartDate, ManagerID)
Access patterns:
Question: Propose a vertical fragmentation.
Solution:
Vertical fragmentation groups attributes into fragments with a key attribute repeated.
Fragment 1: PROJECT_Base (PID, Title, ManagerID)
Fragment 2: PROJECT_Fin (PID, Budget, StartDate)
Reconstruct via PROJECT_Base ⨝ PROJECT_Fin on PID.
Exercise: N=5, R=2, W=4. The network partitions into S1,S2 and S3,S4,S5. Can a write succeed in the left partition? A read?
Solution:
Alternate exercise: Find R and W such that R+W > N but both partitions can perform writes? Impossible because if N=5, one partition has 2 nodes, one has 3. To write, need 4. Neither partition has 4.
Solution Strategy: Always check quorum sizes against available nodes in the partition. The intersection property (R+W > N) guarantees consistency only in a non-partitioned system. During partitions, you must choose between availability and consistency (CAP theorem).
A compact, structured set of tips and worked-example strategies to help you solve exercises from a distributed database systems course/textbook.
Given:
R = 10,000 tuples, S = 50,000 tuples. Hash function partitions data into 10 buckets. Each site sends its bucket to a single join site. Network cost = 1 per tuple. Local join cost negligible.
Question: Compute total network cost.
Solution:
If hash partitioning already aligned with join site:
Each tuple of R and S sent exactly once to join site → cost = 10,000 + 50,000 = 60,000.
If data is initially distributed across sites and must be repartitioned:
Better is to perform parallel hash join: each site joins locally on its own bucket after exchanging only needed buckets (cost = same total data volume). So 60,000 is correct.
Problem:
Relation Orders(OrderID, CustID, Amount) at site X (10,000 tuples).
Relation Customers(CustID, Name, City) at site Y (5,000 tuples).
Query: find orders from customers in ‘Paris’. Write a semi-join to reduce transmission. Exercise : Given PROJECT(Pno, Pname, Budget, Location)
Solution:
Semi-join reduces the size of the left operand before full join.
Step 1 – Project relevant attributes from right relation:
π_CustID(σ_City=‘Paris’(Customers)). Size: if 10% of customers in Paris → 500 CustIDs.
Step 2 – Send projection to site X:
Transmit 500 CustIDs (approx. 500*4 bytes = small).
Step 3 – Compute semi-join at site X:
Orders ⋉ Customers’ = σ_CustID in (Paris CustIDs)(Orders). Assume each customer has 5 orders → 2500 orders remain.
Step 4 – Send reduced Orders to site Y for final join:
Transmit 2500 tuples instead of 10,000. Savings: 75% reduction.
Answer:
Semi-join reduces cost significantly. The semi-join expression:
Orders ⋉ (π_CustID(σ_City=‘Paris’(Customers)))
Problem: Three sites. Transactions $T_1, T_2, T_3$.
Detect the deadlock.
Solution: We construct the Local Wait-For Graphs (LWFG) and combine them into a Global Wait-For Graph (GWFG).
Global Construction: Combine the edges based on transaction identifiers.
Cycle Detection: Tracing the edges: $T_1 \rightarrow T_3 \rightarrow T_2 \rightarrow T_1$. The cycle is closed: $T_1 \rightarrow T_3 \rightarrow T_2 \rightarrow T_1$.
Resolution: The system detects the cycle. It must abort one transaction (victim) to break the lock. Typically, the youngest transaction or the one with the least work done is chosen (e.g., abort $T_3$).
Exercise:
Compute cost without semi-join: Ship entire R (1 MB) or S (0.4 MB). Better to ship S to R’s site: 0.4 MB.
Compute semi-join solution:
Solution Strategy: Always compare total cost of semi-join + reduced tuple transfer vs. naive transfer. Semi-join wins when join selectivity is low. Solution :