You are currently browsing the category archive for the ‘R’ category.

Sukūriau naują blogą: http://myliuduomenis.lt. Įrašai daromi su Rstudio naudojant knitr. Puslapio kodą galima rasti čia. Į kažką panašaus galbūt numigruosiu ir šitą blogą.

Bit of a major rewrite. Added formula interface, convergence testing and support for different optimisation algorithms. Documentation needs to be updated, but all the examples and demo work.

Downloads in the usual place.

Added empirical example. Run demo("okun","midasr") to see. Also changed function hAh.test. Now only restricted model must be supplied, unrestricted is calculated automatically. Downloads in the usual place.

I’ve decided to release a new version of midasr package. In this package the support is added for numerical gradient, exogenous variables and more complicated lag structure. The downloads are in the usual place.

I’ve created my first R package midasr. Phew, documentation took as much time as actual coding. On the other hand the code was provided by my collaborator Virmantas Kvedaras. My task was to clean up the code and prepare R package.

The package is for testing MIDAS regressions. This is a first release, so the functionality is still very basic.

So I’m getting my daily fix of r-bloggers.com and I encounter this post. Hey, I figure, if polish can draw his country flag using R, why can’t lithuanian do it? 🙂 It should be easy, as cut and paste, right? It turned out not so easy in the end, but not that hard either. Here is the final result. The R code with the description of the process is below.

The hardest part was to get the map of the Lithuania. It turns out that the maps package used to obtain the map of Poland has out-dated maps, circa 1980. There was no Lithuania at that time, curse you Soviets 🙂

After searching a bit, I did not find the R package with the updated map, but I found out how to get more recent map into R. First you need to download the following file from this site. The simplified map from this site is in R package maptools, but it seems the full version is not available as an R package.
You have to unzip the file and then you import it in the following way:

library(maptools)
world<-readShapeSpatial("~/Downloads/TM_WORLD_BORDERS-0.3/TM_WORLD_BORDERS-0.3.shp")

lithuania<-world[world$ISO2=="LT",]
@

To get into more readable format the following lines are necessary. I found them on github ggplot2 page. Thanks to Haddley Wickham:

library(gpclib)
gpclibPermit()
library(ggplot2)
lt<-fortify(lithuania,region="ISO2")

> head(lt)
      long      lat order  hole piece group id
1 25.00000 56.29555     1 FALSE     1  LT.1 LT
2 25.07250 56.21915     2 FALSE     1  LT.1 LT
3 25.08111 56.21055     3 FALSE     1  LT.1 LT
4 25.09694 56.20110     4 FALSE     1  LT.1 LT
5 25.10611 56.19721     5 FALSE     1  LT.1 LT
6 25.13388 56.18888     6 FALSE     1  LT.1 LT

As you see in the end we get the data.frame with longitude and lattitudes and additional info. The relevant additional info in our case is in pieces column. Lithuania consists of 2 separate pieces: the mainland and Curonian spit. Since only part of Curonian spit belongs to Lithuania, it is not connected to Lithuanian mainland. So in the map it looks like an island, when in fact it is not.

The points in the data.frame are ordered, so you can plot them immediately:

plot(lt1,axes=FALSE,xlab="",ylab="",type="l")
polygon(lt2)

Now we have the shape, we need only to fill in the colors. To do that we need to divide the 2 polygons into 3 pieces with equal heights.

It is easy to cut the polygon into 2 pieces along the horizontal line. Here are the functions I used to do that:

x.mid <- function(x1, x2, y.mid) {
      c(x1[1] + ((x2[1] - x1[1]) / (x2[2] - x1[2])) * (y.mid - x1[2]),y.mid)
}

cut.poly.in.half <- function(xy,cut.y.point) {
    cond <- xy[,2]>cut.y.point
    if(!is.matrix(xy))xy <- as.matrix(xy)
    if(sum(cond)==0 | sum(!cond)==0) {
        warning("Whole polygon is either below orabove the cut-line, original polygon is returned")
        return(xy)
    }
    
    dcond <- diff(cond)
    if(sum(dcond)==0) {
        
        tmp<- sort(c(which.min(diff(cond)),which.max(diff(cond))))
        start.y <- tmp[1]
        end.y <- tmp[2]


        start <- x.mid(xy[start.y,],xy[start.y+1,],cut.y.point)    
        
        end <- x.mid(xy[end.y,],xy[end.y+1,],cut.y.point)
  
        top <- rbind(xy[1:start.y,],start,end,xy[(end.y+1):nrow(xy),])
        bottom <- rbind(start,xy[(start.y+1):end.y,],end)
        if(top[1,2]<cut.y.point) {
            tmp <- top
            top <- bottom
            bottom <- tmp
        }
        list(top=top,bottom=bottom)
    }
    
}

I borrowed some code from Bogumił Kamiński. The function just adds two points to the array of points. It checks whether the polygon is cuttable, i.e. the horizontal line goes through the middle. It also assumes, that the line cuts the the polygon into 3 pieces, if we imagine the polygon as the line with the start and an end, which basically what polygon is in R. It is easy to add the code for the case of 2 pieces, but this case is rare, and I was lazy, so there you go:).

Now let us start the cutting:

##We need to cut the polygon of all Lithuania into three equal parts
##First cut out the top part
aa <- cut.poly.in.half(as.matrix(lt1),cut.points[2])
##Now divide the remaining part into halves
bb <- cut.poly.in.half(aa$bottom,cut.points[1])
##Curonian spit part only needs to be cut into two parts
cc <- cut.poly.in.half(lt2,cut.points[2])
@

Now get the colors of Lithuanian flag, and do the final step:

##Colors of Lithuanian flag, from top to bottom.
##Gzr stands for geltona, zalia, raudona: yellow, green, red in lithuanian.
gzr <- c("#fdb913","#006a44","#c1272d")
##Colour the top 
polygon(aa$top,col=gzr[1])
polygon(cc$top,col=gzr[1])
##Colour the middle
polygon(bb$top,col=gzr[2])
polygon(cc$bottom,col=gzr[2])
##Colour the bottom
polygon(bb$bottom,col=gzr[3])

Using this code it is more or less easy to replicate this exercise for other countries which have flags with where colours are horizontal and there are no additional structures. Flags with vertical colours should be easy too. But flags with more structure as flag of Finland, or United Kingdom would be naturally harder.

Suppose you have following kind of time series data in the data.frame:

> dtm[1:15, ]
   year month LPMVTUZ LPMVTVC LPMVTXC
1  1993     3    4829    4897  397808
2  1993     4    4496    4727  399425
3  1993     5    4997    4528  400922
4  1993     6    5444    4636  402512
5  1993     7    4899    4521  403940
6  1993     8    4636    4428  405716
7  1993     9    4548    4498  407348
8  1993    10    4687    4460  409435
9  1993    11    4419    4352  410844
10 1994     0    3410    4547  412513
11 1994     1    3787    4915  413913
12 1994     2    5180    4862  416117
13 1994     3    4729    5003  417992
14 1994     4    4871    4875  419575
15 1994     5    5323    4739  421933 

The variables are in the columns and you have two columns for year and
month. Suppose you need to aggregate this data, i.e. get annual values
for the variables. It is possible to do this using loops, but here is
one line solution:

> library(reshape)
> recast(dtm, year ~ variable, fun.aggregate = sum, id.var = c("year", 
+     "month"))
   year LPMVTUZ LPMVTVC  LPMVTXC
1  1993   42955   41047  3637950
2  1994   57885   57949  5075675
3  1995   57285   57349  5392784
4  1996   71659   71249  5692667
5  1997   77228   77074  6079579
6  1998   89375   88936  6517787
7  1999  114708  113565  7081522
8  2000  119794  120147  7748582
9  2001  160123  159500  8480626
10 2002  220737  219763  9517652
11 2003  277342  277167 10794708
12 2004  291258  289876 12256691
13 2005  288280  287910 13543842
14 2006  345355  345648 14822438
15 2007  362632  362596 16232798
16 2008  253198  255364 17327344
17 2009  117936  116994 14586582 

If you want to get average instead of sum, just use
fun.aggregate=mean.

This function actually combines two functions melt and cast. First one
melts your data.frame:

> mdtm <- melt(dtm, id.var = c("year", "month"))
> mdtm[1:15, ]
   year month variable value
1  1993     3  LPMVTUZ  4829
2  1993     4  LPMVTUZ  4496
3  1993     5  LPMVTUZ  4997
4  1993     6  LPMVTUZ  5444
5  1993     7  LPMVTUZ  4899
6  1993     8  LPMVTUZ  4636
7  1993     9  LPMVTUZ  4548
8  1993    10  LPMVTUZ  4687
9  1993    11  LPMVTUZ  4419
10 1994     0  LPMVTUZ  3410
11 1994     1  LPMVTUZ  3787
12 1994     2  LPMVTUZ  5180
13 1994     3  LPMVTUZ  4729
14 1994     4  LPMVTUZ  4871
15 1994     5  LPMVTUZ  5323 

Note that all the names of the variables are now in the column named
variable, and the corresponding values are in the column surprisingly
named value. You can control the names of these columns by the way.

After melting the data.frame now we need to cast it:

> cdtm <- cast(mdtm, year ~ variable, fun.aggregate = sum)
> cdtm
   year LPMVTUZ LPMVTVC  LPMVTXC
1  1993   42955   41047  3637950
2  1994   57885   57949  5075675
3  1995   57285   57349  5392784
4  1996   71659   71249  5692667
5  1997   77228   77074  6079579
6  1998   89375   88936  6517787
7  1999  114708  113565  7081522
8  2000  119794  120147  7748582
9  2001  160123  159500  8480626
10 2002  220737  219763  9517652
11 2003  277342  277167 10794708
12 2004  291258  289876 12256691
13 2005  288280  287910 13543842
14 2006  345355  345648 14822438
15 2007  362632  362596 16232798
16 2008  253198  255364 17327344
17 2009  117936  116994 14586582 

Here we use formula to tell cast function, how to pick data. You can
read more details in the help. What is in the right hand side of the
formula will become columns, and the left hand side will be rows. Try
to exchange them, this is what you will get:

> cast(mdtm, variable ~ year, fun.aggregate = sum)
  variable    1993    1994    1995    1996    1997    1998    1999    2000
1  LPMVTUZ   42955   57885   57285   71659   77228   89375  114708  119794
2  LPMVTVC   41047   57949   57349   71249   77074   88936  113565  120147
3  LPMVTXC 3637950 5075675 5392784 5692667 6079579 6517787 7081522 7748582
     2001    2002     2003     2004     2005     2006     2007     2008
1  160123  220737   277342   291258   288280   345355   362632   253198
2  159500  219763   277167   289876   287910   345648   362596   255364
3 8480626 9517652 10794708 12256691 13543842 14822438 16232798 17327344
      2009
1   117936
2   116994
3 14586582 

We got the same result, just in transposed form.

So here it is. For more info read the documentation of reshape
package. Also look into plyr package by the same author. And if you
ever meet him, please buy him a beer 🙂

In previous post I asked if somebody knew how to get highlighted R code in wordpress. Somebody (yay, I have at least one reader) took this question too seriously, and found out the hard way how to do that. So for those who want to get something like this:

fun <- function(l, nm) {
    l  <- x[[nm]
    ss <- summary(l)
    write.csv(ss,file=paste(nm,".csv",sep=""))
}

foreach(l=x,nm=names(x)) %do% fun(l,nm)

read on, I dare you!

The solution involves Emacs, which in itself is quite intimidating. But as they say, no pain, no gain. After you pass that “I want to poke my eye with red hot poker just for fun” phase, Emacs is quite nice. The drawback of course is, that this phase must be passed 🙂

So for a start, you must have Emacs installed. For Windows I recommend downloading Vincent Goulet’s prepared Emacs distribution. It comes with several useful packages preinstalled, such as ESS. For Ubuntu, your best bet would be to install emacs-snapshot, so you get nice looking Emacs. Then you will need to install ESS. For serious R programming, I think ESS is a must. I do not go into details, but is much more convenient than standard R window.

In addition you have to install htmlize and org-mode Emacs extensions. The org-mode may be included in your Emacs distribution. The instructions how to instal org-mode can be found in the website. To install htmlize, save the file htmlize.el to some directory, and byte-compile it from Emacs with command M-x byte-compile-file. Then include the directory into your Emacs load path, i.e. include this line at the top of your .emacs (or emacs) file:

(setq load-path (cons (expand-file-name "~/src/emacs-lisp/") load-path))

If you do custom org-mode install you may want include the following line also:

(require 'org-install)

To get highlighted R code, you will also need to add this code to your .emacs file:

(defun run-ascii-sweave-on-buffer ()
  (ess-command "library(ascii)\n")
  (ess-command (concat "Sweave(\"" (buffer-file-name)
                       "\", RweaveAsciidoc)\n")))

(defun htmlize-blog-entry ()
   (interactive)
   (save-excursion
      (let*
       (goto-char (point-min))
       (basic-save-buffer)
       (switch-to-buffer (org-export-region-as-html
                          (point-min) (point-max) t "*HTML*"))
       (while (re-search-forward "<pre " nil t)
         (replace-match
          "<pre style=\"background-color:#FFFFE5; font-size:8pt;overflow:auto\" "
          t nil))
       (kill-ring-save (point-min) (point-max))
      )))

(defun sweave-and-htmlize-blog-entry ()
   (interactive)
   (save-excursion
     (run-ascii-sweave-on-buffer)
     (let* ((name (buffer-file-name))
            (txt-filename (concat name ".txt"))
            (txt-buf (find-file txt-filename)))
       (switch-to-buffer txt-buf)
       (revert-buffer t t t)
       (goto-char (point-min))
       (while (re-search-forward "BEGIN_SRC Sweave" nil t)
         (replace-match "BEGIN_SRC R-transcript" t nil))
       (goto-char (point-min))
       (while (re-search-forward "^----" nil t)
         (replace-match "" nil nil))
       (basic-save-buffer)
       (switch-to-buffer (org-export-region-as-html
                          (point-min) (point-max) t "*HTML*"))
       (kill-buffer txt-buf)
       (while (re-search-forward "<pre " nil t)
         (replace-match
          "<pre style=\"background-color:#FFFFE5; font-size:8pt;overflow:auto\" "
          t nil))
       (kill-ring-save (point-min) (point-max)))))

(define-key global-map (kbd "<f11>") 'sweave-and-htmlize-blog-entry)
(define-key global-map (kbd "<f12>") 'htmlize-blog-entry)

I took those functions from the this blog, and made some modifications. You can copy paste it directly from here, or download all the code from here

To get the syntax highlighting you must write all the necessary material in the file with extension .org. Here is a sample file with some examples from this post. When you open this file in the Emacs, it should automatically open up org-mode (if it was installed properly). Now if you want some R code to syntax highlight, you write it inside these two statements:

#+BEGIN_SRC R
fun <- function(l, nm) {
    l  <- x[[nm]
    ss <- summary(l)
    write.csv(ss,file=paste(nm,".csv",sep=""))
}
foreach(l=x,nm=names(x)) %do% fun(l,nm)
#+END_SRC R

Then if you are finished, just press f12, and Emacs will open you a new window with shiny HTML code. Copy this code to your blog, and that is it.

If you also want to include R code which gets executed by R, like this:

> x <- rnorm(100)
> summary(x)
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max.
-2.08300 -0.67620  0.03544  0.06616  0.81680  2.34000

first you must start R process with M-x R. Then include in your .org code the code snippet like this (it helps if you know Sweave syntax):

#+BEGIN_SRC Sweave
<>=
x <- rnorm(100)
summary(x)
@
#+END_SRC

and then press f11.

The keyboard shortcuts can be changed if you want, just modify the the emacs code, you’ve added to your .emacs file. The code is more or less self-explanatory. By pressing f11 you invoke function sweave-and-htmlize-blog-entry, which runs
Sweave function from R on your text. The resulting file is exported to html by org-mode. By pressing f12 you invoke function htmlize-blog-entry, which exports to html, without invoking R. Since we use org-mode export, you can syntax highlight not only R code, but other code, which has support in Emacs. This means C, emacs-lisp, latex and others.

So here it is. If something did not work, try asking here, maybe I can help. I do not promise anything, I am no Emacs expert.

Special thanks goes to Raimondas, who spent two days to get a working example, with no previous Emacs knowledge. Buy him a beer, if you find this post useful. Also do not forget the guy who wrote the initial code, and the guys who created Emacs, org-mode, htmlize, ESS, R. Buy them a beer also 🙂

Frequently I find myself having to write something like this in R:

for (nm in names(x)) {
    l <- x[[nm]]
    ss <- summary(l)  # produce nice table for writing
    write.csv(ss,file=paste(nm,".csv",sep=""))
}

Here x is some list, where each element contains some fitted model. This always bugs me, since I cannot use lapply, or other fancy functions like *lply (from package plyr). The problem is that I need to keep the name of the list element, and all of functions mentioned, do not do that. So if I have a generic function:

fun <- function(l, nm) {
    l  <- x[[nm]]
    ss <- summary(l)
    write.csv(ss,file=paste(nm,".csv",sep=""))
}

I cannot use it with lapply like this

 lapply(x,fun)

since then I need to pass different nm for each list element.

With foreach however it is very easy to do:

foreach(l=x,nm=names(x)) %do% {
    ss <-  summary(l)
    write.csv(ss,file=paste(nm,".csv",sep=""))
}

Or I can use my super-convenient function:

foreach(l=x,nm=names(x)) %do% fun(l,nm)

You can even do some interesting stuff with parallelization, since foreach supports it. So if your computer has processor with several cores you can use both for intensive calculations. BTW if anyone knows easy way of highlighting R code in wordpress, let me know.