In looking at 5.1, I notice we're attempting a run time of O(log n), which implies a divide and conquer strategy, with recurrence on one half of the problem. At that same time we have two databases to consider, and we may well have to do this twice.
This was my initial assumption anyway. The problem of attempting to solve for the median between two databases of values, both sorted individually makes for an interesting problem. In thinking it over, I've come across some test cases that are going to make things a little interesting.
But first, let's lay down some facts. We know there are 2n numbers contained between both databases, and that each database contains n numbers. We also know that all of the numbers are unique. Based on the definition of the median, our goal is to find two numbers between the two databases and average them to determine the median.
Initially the thought is that there exists one number in both databases that you can narrow down to. So if you start with the n/2 value in both databases, you merely need to compare which is less, and which is more, and up or down in the problem accordingly. This helps to keep the problem balanced so you always have a ratio between the two databases that correspond to n number above the potential median solutions, and n above. This seems like a pretty good start, however there is a problem with this.
The problem does NOT guarantee that both numbers will be in separate databases. This means that the two numbers for the median, could very well be in one query. This could cause a potential problem, as it's a very dangerous assumption.
Here's an example: 1,2,3,4,5,6 0,7,8,9,10,11
In that case, the correct median values are 5, and 6, for an actual median of 5.5
Another tough case to guess will be the border cases, where the median values occur on the edge. However, at this point, because of the first example, I'm some-what stuck on what to do for an approach. Everything I keep coming up with seems to suffer, based on the same assumption. There has to be some way to consider both of the queries as one query, and to try and work from there, but because they are sorted individually I can't seem to see it.
Are there any thoughts on this problem so far, that might be slightly more insightful?
So, this is true, but luckily the problem takes care of this for us. The problem defines the median (in this case) to be " the nth smallest number" which would be the smaller of the two middle numbers, therefore, if we set up the algorithm correctly, even if both "medians" are in one database, all we have to worry about is that in the end we get the smaller of those two numbers. Not that we need to get the mean of two numbers in one database. Make sense?
considering how tricky it gets when your recursion has narrowed it down to 2 or 3 elements per list, I think it's reasonable to manually (naively) find the median of this list of 4 or 5, since this would be a constant-time addition to the algorithm. I believe it makes the base case of the recursion much simpler to deal with.
Since both arrays are sorted (as long as your algorithm does not mess up with this property), it is rather easy to find the "median" as defined in the question when both arrays have 2 elements left.
As I said in class, I don't think the base cases are the main thing to worry about. Any D&C algorithm is eventually going to work its way down to inputs of small enough size that brute force will be practical.
That being said, I helped one student to debug the code he wrote up for this algorithm, and the only base case needed was for 2 lists of length 1. So the base case is not tricky at all if you have the right algorithm.
The question states is there a set of more than n/2 of the bank cards that are all equivalent to one another?
Is the question asking if it is possible that more than n/2 cards are all from the same bank account???
Thinking out loud: It seems that if n = 20 than a bank account could have 11 or more cards...that seems strange to me OH! i.e. credit fraud, perhaps???
I think it is possible for all n of the cards to be from the same bank account. The question is only asking us to determine if at least n/2+1 of these cards come from the same account. The only "high tech" tool we have is the equivalence tester, and the question asks us to find an algorithm which optimizes use of this equivalence tester to O(nlogn).
If you're mainly commenting on the realism of the problem, then I agree. Some problems in this book are a bit of a stretch, but at least the backstory keeps it interesting.
It is exactly what the "bank" wants to find out in the question. If more half of the bank cards belong to the same account, it is considered a (possible) fraud and worth investigation. Your job as a programmer is to help the bank to find out if there is such a bank card.
In looking at 5.1, I notice we're attempting a run time of O(log n), which implies a divide and conquer strategy, with recurrence on one half of the problem. At that same time we have two databases to consider, and we may well have to do this twice.
ReplyDeleteThis was my initial assumption anyway. The problem of attempting to solve for the median between two databases of values, both sorted individually makes for an interesting problem. In thinking it over, I've come across some test cases that are going to make things a little interesting.
But first, let's lay down some facts. We know there are 2n numbers contained between both databases, and that each database contains n numbers. We also know that all of the numbers are unique. Based on the definition of the median, our goal is to find two numbers between the two databases and average them to determine the median.
Initially the thought is that there exists one number in both databases that you can narrow down to. So if you start with the n/2 value in both databases, you merely need to compare which is less, and which is more, and up or down in the problem accordingly. This helps to keep the problem balanced so you always have a ratio between the two databases that correspond to n number above the potential median solutions, and n above. This seems like a pretty good start, however there is a problem with this.
The problem does NOT guarantee that both numbers will be in separate databases. This means that the two numbers for the median, could very well be in one query. This could cause a potential problem, as it's a very dangerous assumption.
Here's an example:
1,2,3,4,5,6
0,7,8,9,10,11
In that case, the correct median values are 5, and 6, for an actual median of 5.5
Another tough case to guess will be the border cases, where the median values occur on the edge. However, at this point, because of the first example, I'm some-what stuck on what to do for an approach. Everything I keep coming up with seems to suffer, based on the same assumption. There has to be some way to consider both of the queries as one query, and to try and work from there, but because they are sorted individually I can't seem to see it.
Are there any thoughts on this problem so far, that might be slightly more insightful?
So, this is true, but luckily the problem takes care of this for us. The problem defines the median (in this case) to be " the nth smallest number" which would be the smaller of the two middle numbers, therefore, if we set up the algorithm correctly, even if both "medians" are in one database, all we have to worry about is that in the end we get the smaller of those two numbers. Not that we need to get the mean of two numbers in one database. Make sense?
Deleteconsidering how tricky it gets when your recursion has narrowed it down to 2 or 3 elements per list, I think it's reasonable to manually (naively) find the median of this list of 4 or 5, since this would be a constant-time addition to the algorithm. I believe it makes the base case of the recursion much simpler to deal with.
ReplyDeleteSince both arrays are sorted (as long as your algorithm does not mess up with this property), it is rather easy to find the "median" as defined in the question when both arrays have 2 elements left.
DeleteAs I said in class, I don't think the base cases are the main thing to worry about. Any D&C algorithm is eventually going to work its way down to inputs of small enough size that brute force will be practical.
DeleteThat being said, I helped one student to debug the code he wrote up for this algorithm, and the only base case needed was for 2 lists of length 1. So the base case is not tricky at all if you have the right algorithm.
I had a question pertaining to 5.3:
ReplyDeleteThe question states is there a set of more than n/2 of the bank cards that are all equivalent to one another?
Is the question asking if it is possible that more than n/2 cards are all from the same bank account???
Thinking out loud:
It seems that if n = 20 than a bank account could have 11 or more cards...that seems strange to me OH! i.e. credit fraud, perhaps???
Thanks for any feedback :)
I think it is possible for all n of the cards to be from the same bank account. The question is only asking us to determine if at least n/2+1 of these cards come from the same account. The only "high tech" tool we have is the equivalence tester, and the question asks us to find an algorithm which optimizes use of this equivalence tester to O(nlogn).
DeleteIf you're mainly commenting on the realism of the problem, then I agree. Some problems in this book are a bit of a stretch, but at least the backstory keeps it interesting.
It is exactly what the "bank" wants to find out in the question. If more half of the bank cards belong to the same account, it is considered a (possible) fraud and worth investigation. Your job as a programmer is to help the bank to find out if there is such a bank card.
DeleteThanks Mark & ollie for the feedback, I just wanted to be sure I was doing it right :)
Delete