Dissociated PressPosted: November 26, 2007
There is this nice Emacs game where you take some random text and use random walk to generate a somewhat similar but completely random text out of it. It is called “Dissociated Press“.
Out of sheer randomness, I’ve implemented this in Oracle.
- I’m using varchar to hold the text, which limit us to short and therefore uninteresting texts. CLOB would be better.
- If there is any serious amount of text involved, you’ll want to create a non-unique index on the first two columns of the markov table after you populate the table, and gather statistics on that.
- Feel free to criticize my code. That is why I post it. But note that I’ve broken some lines so it will look nice in the blog.
- Also feel free to rewrite the entire thing with a single model statement.
- The algorithm should be pretty straightforward, but you can read about a similar implementation here.
CREATE TABLE markov (c1 VARCHAR(30), c2 VARCHAR(30), c3 VARCHAR(30)); DECLARE cnt BINARY_INTEGER; i BINARY_INTEGER; my_str VARCHAR(4000) := 'A bug that has been documented. To call something a feature sometimes means the author of the program did not consider the particular case, and that the program responded in a way that was unexpected but not strictly incorrect. A standard joke is that a bug can be turned into a feature simply by documenting it (then theoretically no one can complain about it because its in the manual), or even by simply declaring it to be good. Thats not a bug, thats a feature is a common catchphrase.'; my_comma_str VARCHAR(4000); my_table dbms_utility.uncl_array; num_itr INTEGER := 50; tmp_c1 VARCHAR(30); tmp_c2 VARCHAR(30); tmp_c3 VARCHAR(30); BEGIN select REPLACE('"' || REPLACE(my_str,' ','","') || '"',',"",',',') INTO my_comma_str FROM dual; dbms_utility.comma_to_table(my_comma_str, cnt, my_table); FOR i in 3..cnt LOOP INSERT INTO markov (c1,c2,c3) VALUES (replace(my_table(i-2),'"',''), replace(my_table(i-1),'"',''),replace(my_table(i),'"','')); END LOOP; COMMIT; SELECT c1,c2,c3 INTO tmp_c1,tmp_c2,tmp_c3 FROM markov WHERE ROWNUM=1; DBMS_OUTPUT.put_line(tmp_c1 || ' ' || tmp_c2 || ' '); FOR i IN 1..num_itr loop SELECT c1,c2,c3 INTO tmp_c1,tmp_c2,tmp_c3 from (SELECT * FROM markov WHERE c1=tmp_c2 AND c2=tmp_c3 ORDER BY dbms_random.VALUE) t WHERE ROWNUM=1; DBMS_OUTPUT.put_line(tmp_c2 || ' '); END LOOP; END;
This post has been partially inspired by this one.