jmate 0.5 simplification, CountingSet

Ok, i have made the Jmate 0.5 release. in this release, i made a decision and simplfy some usages of SimpleFileReader. Actually i still have a feeling that this class may need to be dedicated to the text files only, so i may have a drastic name change later. But here are the changes i made
I eliminated the "Filter" usage for making powerfult text readers. because it was too complex for the beginner's to understand and clumsy for the rest. so i made new methods with reasonable names instead. before reading all lines from a text lines while applying trim, ignoring empty lines and allowing only a matching regular expression was like this:

List list = new SimpleFileReader.Builder("test/multi_line_text_file.txt")
.filters(StringFilters.PASS_ONLY_TEXT, StringFilters.newPrefixFilter("^[^#]")
.trim()
.build()
.asStringList();

Now it is like this:

List list = new SimpleFileReader.Builder("test/multi_line_text_file.txt")
.allowMatchingRegexp("^[^#]")
.ignoreWhiteSpaceLines()
.trim()
.build()
.asStringList();

to me this is cleaner, and user do not need to know yet another class.

Now there is a new concept called Template, which actually contains the properties of a SimplefileReader, so i can use it to generate other SimpleFileReader objcets using it. This is a tiny bit advanced use, i doubt it will be used frequently.

SimpleFileReader.Template template = new SimpleFileReader.Template()
.allowMatchingRegexp("^[^#]")
.ignoreWhiteSpaceLines()
.trim();

List files = Files.crawlDirectory(new File("blah"));
for (File file : files) {
SimpleFileReader sr = template.generateReader(file);
// .... read it , do something..
}

I eliminated a bug in Strings.insertFrom... methods. Now you can add white spaces as well.

There is a new class, called "CountingSet" it is not a real "Set" but what it does is a frequent necessity for me. it basically counts the elements that you add. For example:

CountingSet histogram = new CountingSet();
histogram.add("Apple", "Pear", "Plum", "Apple", "Apple", "Grape", "Pear");
for (String s : histogram)
out.println(s + " count:" + histogram.getCount(s));

will give you

Pear count:2
Apple count:3
Plum count:1
Grape count:1

Also you can sort the items by frequency or with a Comparator.

for(String s : histogram.getSortedList())
out.println(s + " count:" + histogram.getCount(s));

Apple count:3
Pear count:2
Plum count:1
Grape count:1

CountingSet is not remarkably fast or efficent because it uses a Map inside. i may try optimizing it for speed and memory efficiency later. There is a MultiFileReader class, but it is not yet public. i may have second thoughts about it.

You can download the version 0.5 from http://code.google.com/p/jmate/downloads/list
Changes can be seen from here : http://code.google.com/p/jmate/source/list

0 comments: