WordsCount
From Wizardsforge
(→Note to Brian) |
|||
Line 3: | Line 3: | ||
http://editthis.info/wizardsforge/Talk:WordsCount | http://editthis.info/wizardsforge/Talk:WordsCount | ||
+ | ==Interesting Resources== | ||
+ | Not a lot yet, but I did find this site on AI where the guy running it has a bunch of interesting stuff including programs written in Python. I think I posted the link to his ftp downloads in IRC the other day but the main site is worth a look. | ||
+ | http://zhar.net/ | ||
==SCALE== | ==SCALE== |
Revision as of 13:49, 15 November 2006
Contents |
Note to Brian
I've moved the conversation to the discussion tab for this page. I'll do some more work to clean up this area for any future work. Please check there for the latest comments. We can move them somewhere else and even return them here if you prefer. I'd like to make it easy and quick to add notes and the discussion tab seemed a good place but You can also get there by making a direct link like this: http://editthis.info/wizardsforge/Talk:WordsCount
Interesting Resources
Not a lot yet, but I did find this site on AI where the guy running it has a bunch of interesting stuff including programs written in Python. I think I posted the link to his ftp downloads in IRC the other day but the main site is worth a look. http://zhar.net/
SCALE
Steve seems interested and will probably be a big help to me as far as our WordsCount project and as well as in getting Psyche back in order for the next SCALE. He did say he would like to go over it and fix up some stuff and I might be able to learn more Python from him. We should probably devote a reasonable effort to SCALE soon too.
Here is a brief rundown on the current state of affairs the early part of November 2006.
html2text
I don't know how this was overlooked, maybe it wasn't and found lacking. Anyway it might make a great tool for grabbing the text from websites. You may already have it installed, I did. It is html2text and more info on it can be found at http://userpage.fu-berlin.de/~mbayer/tools/html2text.html beware there are other tools with the same name including one in Python http://userpage.fu-berlin.de/~mbayer/tools/html2text.html which has a mixed message in that its license is GPL2.0 but has "Try" and "Buy" headings suggesting either a sense of humor or a desire to get money. That is just the beginning and about as far as I got, but there are more free as well as commercial versions of similar or identical products and services. I don't know how well they do commercially but we may want to take a closer look at some of their business models and marketing strategies.
Briefly, the tool takes arguments and options that sends a text version of an URL (or standard in) passed to it and sends the results to standard out or to a file. It may be the quickest and easiest way to grab text from a page.
pdftohtml and pdftotext
I also discovered these open source tools. I've only tested pdftotext so far. It did a fair job of converting but left out most traces of formatting. It turned a 6.3MB pdf into 1.4MB text file. It managed to do that in relatively short amount of time. It means we don't have to leave out pdf documents in a survey. There are probably other cool tools to do other stuff. All of this is for future reference since we can obviously proceed for now without these capabilities but is good to note.
ForPractice Update
I posted to IRC but that can be a bit too ephemeral. I have set up a ftp/pop account at: http://forpractice.com/kbsig/ I don't have anything there yet except the old Psyche files.
I also dusted of this site I created some time ago that it seems you never used at: http://forpractice.com/brian/ Contact me for how/if you want the password.
We also still have this one I made a long time ago when our regular server was down. I'd forgotten about it but rediscovered it when setting up this other stuff. http://forpractice.com/sfvlug/ it also has a link to Psyche files in its own subdirectory.
Here is info on getting our own URL to point to our wiki. There are probably other ways than this to get the job done.
Python Mutable Error
I don't know if you recall but I mentioned a problem where Python gives an erroneous answer without complaining. Here is an example:
>>> s = [2, 4, 6, 1, 3, 5, 9, 8]
What my text says is, "If the sequence is a list, don't modify it in place in the body of the loop; if you do, Python may skip or repeat sequence items. Iterate over a copy of the sequence instead. This problem happens only for mutable sequences (that is, lists)."
As you can see the for loop through the list gave errors without complaining while in the for loop pass through the copy it worked just fine.
>>> for x in s:
... if x % 2 != 0: s.remove(x)
...
>>> print s
[2, 4, 6, 3, 9, 8]
>>>
>>> s = [2, 4, 6, 1, 3, 5, 9, 8]
>>> for x in s[:]:
... if x % 2 != 0: s.remove(x)
...
>>> print s
[2, 4, 6, 8]
Return to Main Page
Dust Bin is just a place to put old stuff I'm not quite ready to toss out yet.