Suppose you have following kind of time series data in the data.frame:
> dtm[1:15, ]
year month LPMVTUZ LPMVTVC LPMVTXC
1 1993 3 4829 4897 397808
2 1993 4 4496 4727 399425
3 1993 5 4997 4528 400922
4 1993 6 5444 4636 402512
5 1993 7 4899 4521 403940
6 1993 8 4636 4428 405716
7 1993 9 4548 4498 407348
8 1993 10 4687 4460 409435
9 1993 11 4419 4352 410844
10 1994 0 3410 4547 412513
11 1994 1 3787 4915 413913
12 1994 2 5180 4862 416117
13 1994 3 4729 5003 417992
14 1994 4 4871 4875 419575
15 1994 5 5323 4739 421933
The variables are in the columns and you have two columns for year and
month. Suppose you need to aggregate this data, i.e. get annual values
for the variables. It is possible to do this using loops, but here is
one line solution:
> library(reshape) > recast(dtm, year ~ variable, fun.aggregate = sum, id.var = c("year", + "month")) year LPMVTUZ LPMVTVC LPMVTXC 1 1993 42955 41047 3637950 2 1994 57885 57949 5075675 3 1995 57285 57349 5392784 4 1996 71659 71249 5692667 5 1997 77228 77074 6079579 6 1998 89375 88936 6517787 7 1999 114708 113565 7081522 8 2000 119794 120147 7748582 9 2001 160123 159500 8480626 10 2002 220737 219763 9517652 11 2003 277342 277167 10794708 12 2004 291258 289876 12256691 13 2005 288280 287910 13543842 14 2006 345355 345648 14822438 15 2007 362632 362596 16232798 16 2008 253198 255364 17327344 17 2009 117936 116994 14586582
If you want to get average instead of sum, just use
fun.aggregate=mean
.
This function actually combines two functions melt and cast. First one
melts your data.frame:
> mdtm <- melt(dtm, id.var = c("year", "month")) > mdtm[1:15, ] year month variable value 1 1993 3 LPMVTUZ 4829 2 1993 4 LPMVTUZ 4496 3 1993 5 LPMVTUZ 4997 4 1993 6 LPMVTUZ 5444 5 1993 7 LPMVTUZ 4899 6 1993 8 LPMVTUZ 4636 7 1993 9 LPMVTUZ 4548 8 1993 10 LPMVTUZ 4687 9 1993 11 LPMVTUZ 4419 10 1994 0 LPMVTUZ 3410 11 1994 1 LPMVTUZ 3787 12 1994 2 LPMVTUZ 5180 13 1994 3 LPMVTUZ 4729 14 1994 4 LPMVTUZ 4871 15 1994 5 LPMVTUZ 5323
Note that all the names of the variables are now in the column named
variable, and the corresponding values are in the column surprisingly
named value. You can control the names of these columns by the way.
After melting the data.frame now we need to cast it:
> cdtm <- cast(mdtm, year ~ variable, fun.aggregate = sum) > cdtm year LPMVTUZ LPMVTVC LPMVTXC 1 1993 42955 41047 3637950 2 1994 57885 57949 5075675 3 1995 57285 57349 5392784 4 1996 71659 71249 5692667 5 1997 77228 77074 6079579 6 1998 89375 88936 6517787 7 1999 114708 113565 7081522 8 2000 119794 120147 7748582 9 2001 160123 159500 8480626 10 2002 220737 219763 9517652 11 2003 277342 277167 10794708 12 2004 291258 289876 12256691 13 2005 288280 287910 13543842 14 2006 345355 345648 14822438 15 2007 362632 362596 16232798 16 2008 253198 255364 17327344 17 2009 117936 116994 14586582
Here we use formula to tell cast function, how to pick data. You can
read more details in the help. What is in the right hand side of the
formula will become columns, and the left hand side will be rows. Try
to exchange them, this is what you will get:
> cast(mdtm, variable ~ year, fun.aggregate = sum)
variable 1993 1994 1995 1996 1997 1998 1999 2000
1 LPMVTUZ 42955 57885 57285 71659 77228 89375 114708 119794
2 LPMVTVC 41047 57949 57349 71249 77074 88936 113565 120147
3 LPMVTXC 3637950 5075675 5392784 5692667 6079579 6517787 7081522 7748582
2001 2002 2003 2004 2005 2006 2007 2008
1 160123 220737 277342 291258 288280 345355 362632 253198
2 159500 219763 277167 289876 287910 345648 362596 255364
3 8480626 9517652 10794708 12256691 13543842 14822438 16232798 17327344
2009
1 117936
2 116994
3 14586582
We got the same result, just in transposed form.
So here it is. For more info read the documentation of reshape
package. Also look into plyr package by the same author. And if you
ever meet him, please buy him a beer 🙂
Komentarų: 2
Comments feed for this article
2011/08/16 6:25 am
Norm Albertson
Thanks for the most concise and easy to understand explanation of how to use cast and melt that I’ve found on the internet!
2012/04/10 12:52 am
Karl
I use reshape alot though like to mention that the built-in aggregate function can also be used. I point this out in case there are newcomers to R who think they must use reshape in cases like this.
aggregate(cbind(LPMVTUZ,LPMVTVC,LPMVTXC) ~ year,dtm,sum)