The Most Influential and Most Cited Works in Computer Science Project

When I was a researcher at a university, I co-authored several papers and those papers were cited by other people works. And today, I was surprised when looking at my Google Scholar page, my citation number is growing! Wow that's really amazing! Although it's not as many as other papers out there but I'm really excited.

I questioned my self "What does it mean to be a high cited paper?". I believe, high cited papers  have a very big impact and influence in their fields. Impact and influence in this context include creating a new field or area of research, a novel  approach to problem, or even changing paradigm of the field completely. What about computer science? What kind of high cited papers shape the field? It would be wonderful if I could know what kind of papers and researches that already shape the field that I passionate about!

Therefore, I decided to start this project and start collecting the high cited papers in computer science, read the papers, and then write the main idea and summary of each paper as a blog post. Ideally each blog post will contain:

  1. The field of establishment of that paper.
  2. The problem it wants to solve.
  3. The solution it provides.
  4. The evaluation for future works.
  5. Contribution to the field of work.
  6. Critical review (my own opinion of that paper).
I searched the Internet for the most influential and most cited works in computer science and found there sources:
  • https://en.wikipedia.org/wiki/List_of_important_publications_in_computer_science
  • http://citeseerx.ist.psu.edu/stats/articles
  • http://www.journals.elsevier.com/computer-science-review/most-cited-articles/
From all those sources here are the list of papers I want to read:
  • From Wikipedia (Source I)
    1. Computing Machinery and Intelligence (source full paper)
    2. A Proposal for the Dartmouth Summer Research Project on Artificial Intelligence (source full paper)
    3. Fuzzy sets (source full paper)
    4. An Inductive Inference Machine (source full paper)
    5. Language identification in the limit (source full paper)
    6. On the uniform convergence of relative frequencies of events to their probabilities (source full paper)
    7. A theory of the learnable (source full paper)
    8. Learning representations by back-propagating errors (source full paper)
    9. Induction of Decision Trees (source full paper)
    10. Learning Quickly When Irrelevant Attributes Abound: A New Linear-threshold Algorithm
    11. Learning to predict by the method of Temporal difference
    12. Learnability and the Vapnik–Chervonenkis dimension
    13. Cryptographic limitations on learning boolean formulae and finite automata
    14. The strength of weak learnability
    15. Learning in the presence of malicious errors
    16. A training algorithm for optimum margin classifiers
    17. A fast learning algorithm for deep belief nets
    18. Knowledge-based analysis of microarray gene expression data by using support vector machines
    19. Collaborative networks: A new scientific discipline
    20. Collaborative Networks: Reference Modeling
    21. On the translation of languages from left to right
    22. Semantics of Context-Free Languages
    23. A program data flow analysis procedure
    24. A Unified Approach to Global Program Optimization
    25. gprof: A Call Graph Execution Profiler
    26. Compilers: Principles, Techniques and Tools
    27. Colossus computer
    28. First Draft of a Report on the EDVAC
    29. Architecture of the IBM System/360
    30. The case for the reduced instruction set computer
    31. Comments on "the Case for the Reduced Instruction Set Computer"
    32. The CRAY-1 Computer System
    33. Validity of the Single Processor Approach to Achieving Large Scale Computing Capabilities
    34. A Case for Redundant Arrays of Inexpensive Disks (RAID)
    35. The case for a single-chip multiprocessor
    36. The Rendering Equation
    37. Elastically deformable models
    38. The Phase Correlation Image Alignment Method
    39. Determining Optical Flow
    40. An Iterative Image Registration Technique with an Application to Stereo Vision
    41. The Laplacian Pyramid as a compact image code
    42. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images
    43. Snakes: Active contour models
    44. Condensation – conditional density propagation for visual tracking
    45. Object recognition from local scale-invariant features
    46. A relational model for large shared data banks
    47. Binary B-Trees for Virtual Memory
    48. Relational Completeness of Data Base Sublanguages
    49. The Entity Relationship Model – Towards a Unified View of Data
    50. SEQUEL: A structured English query language
    51. The notions of consistency and predicate locks in a database system
    52. Federated database systems for managing distributed, heterogeneous, and autonomous databases
    53. Mining association rules between sets of items in large databases
    54. A Vector Space Model for Automatic Indexing
    55. Extended Boolean Information Retrieval
    56. A Statistical Interpretation of Term Specificity and Its Application in Retrieval
    57. An experimental timesharing system
    58. The Working Set Model for Program Behavior
    59. Virtual Memory, Processes, and Sharing in MULTICS
    60. The nucleus of a multiprogramming system
    61. A note on the confinement problem
    62. The UNIX Time-Sharing System
    63. Weighted voting for replicated data
    64. Experiences with Processes and Monitors in Mesa
    65. Scheduling Techniques for Concurrent Systems
    66. A Fast File System for UNIX
    67. The Design of the UNIX Operating System
    68. The Design and Implementation of a Log-Structured File System
    69. Microkernel operating system architecture and Mach
    70. An Implementation of a Log-Structured File System for UNIX
    71. Soft Updates: A Solution to the Metadata Update problem in File Systems
    72. The FORTRAN Automatic Coding System
    73. Recursive functions of symbolic expressions and their computation by machine, part I
    74. ALGOL 60
    75. The next 700 programming languages
    76. Fundamental Concepts in Programming Languages
    77. Lambda Papers
    78. Structure and Interpretation of Computer Programs
    79. Comprehending Monads
    80. Towards a Theory of Type Structure
    81. An axiomatic basis for computer programming
    82. Probabilistic representation of formal languages
    83. Two-level morphology: A general computational model of word-form recognition and production
    84. A tutorial on hidden Markov models and selected applications in speech recognition
    85. Transformation-based error-driven learning and natural language processing: A case study in part-of-speech tagging
    86. Realization of Natural-Language Interfaces Using Lazy Functional Programming
    87. Software engineering: Report of a conference sponsored by the NATO Science Committee
    88. A Description of the Model-View-Controller User Interface Paradigm in the Smalltalk-80 System
    89. Go To Statement Considered Harmful
    90. On the criteria to be used in decomposing systems into modules
    91. Hierarchical Program Structures
    92. A technique for software module specification with examples
    93. Structured Design
    94. The Emperor's Old Clothes
    95. The Mythical Man-Month: Essays on Software Engineering
    96. No Silver Bullet: Essence and Accidents of Software Engineering
    97. The Cathedral and the Bazaar
    98. Statecharts: A Visual Formalism For Complex Systems
    99. Untraceable electronic mail, return addresses, and digital pseudonyms
    100. Anonymity Loves Company: Usability and the Network Effect
    101. New Directions in Cryptography
    102. A Method For Obtaining Digital Signatures And Public-Key Cryptosystems
    103. Security, Authentication, and Public Key Systems
    104. Password security: a case history
    105. Measuring password guessability for an entire university
    106. The Protection of Information in Computer Systems
    107. Thirty Years later: Lessons from the Multics Security Evaluation
    108. A Note on the Confinement Problem
    109. Reflections on Trusting Trust
    110. An Empirical Study of the Robustness of Windows NT Applications Using Random Testing
    111. Why Johnny Can't Encrypt: A Usability Evaluation of PGP 5.0
    112. Remembrance of Data Passed
  • From Citeseerx (Source II)
    1. Statistical Learning Theory. 1998
    2. Introduction to Algorithms. 1990
    3. Maximum likelihood from incomplete data via the EM algorithm. 1977
    4. Distinctive image features from scaleinvariant keypoints. In: International Journal of Computer Vision, 2004
    5. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997
    6. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 1989
    7. Reinforcement Learning, an introduction. 1998
    8. Optimization by simulated annealing. Science, 1983
    9. A scalable peer-to-peer lookup service for internet applications. 2001
    10. Libsvm: a library for support vector machines. 0
    11. Prospect theory: An analysis of decision under risk. Econometrica, 1979
    12. Variational Analysis. 1997
    13. Induction of decision trees. Machine Learning, 1986
    14. Communicating Sequential Processes. 1985
    15. The Anatomy of a Large-Scale Hypertextual Web Search Engine. in Proc. of 7th International WWW Conference, 1998
    16. The large N limit of superconformal field theories and supergravity. 1998
    17. A computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1986
    18. An Introduction to the Bootstrap. 1993
    19. Snakes: active contour models. International Journal of Computer Vision, 1988
    20. Scheduling Algorithms for Multiprogramming in a Hard Real-Time Environment. Journal of the ACM, 1973
    21. A Method for Obtaining Digital Signatures and Public-Key Cryptosystems. Communications of the ACM, 1978
    22. Graph-based algorithms for boolean function manipulation. IEEE Transactions on Computers, 1986
    23. Eigenfaces for recognition. Journal of Cognitive Neuroscience, 1991
    24. Indexing by latent semantic analysis. Journal of the Society for Information Science, 1990
    25. Authoritative sources in a hyperlinked environment. In Proceedings of the Ninth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA, 1998
    26. New Directions in Cryptography. IEEE Transactions on Information Theory, 1976
    27. A Scalable Content-Addressable Network. In Proceedings of the ACM SIGCOMM '01 Conference, 2001
    28. Fast algorithms for mining association rules. In Proceedings of the 20th International Conference on Very Large Databases (VLDB, 1994
    29. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000
    30. Bagging predictors. Machine Learning, 1996
    31. Handbook of Applied Cryptography. 1996
    32. Compositional model checking. In LICS, 1
    33. A Theory for Multiresolution Signal Decomposition: The Wavelet Representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1989
    34. Modern Information Retrieval. 1999
    35. PAUP: phylogenetic analysis using parsimony. Version 4.0b8. Sinauer Associates. 2001
    36. Mining Association Rules between Sets of Items in Large Databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data, 1993
    37. Latent dirichlet allocation. Journal of Machine Learning Research, 2003
    38. A translation approach to portable ontology specifications. Knowledge Acquisition, 1993
    39. Dynamic source routing in ad hoc wireless networks. in Mobile Computing, Imielinski and Korth, Eds, 1996
    40. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In ICML-2001, 2001
    41. Maintaining knowledge about temporal intervals. Communications of ACM, 1983
    42. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 1998
    43. A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci, 1997
    44. Ad-hoc ondemand distance vector routing. In Proc. of the Mobile Computing Systems and Applications, 1999
    45. Time, clocks, and the ordering of events in a distributed system. Communications of the ACM, 1978
    46. Emerging of Scaling in Random Networks. Science, 1999
    47. Computational Complexity. 1994
    48. Congestion avoidance and control. ACM Computer Communication Review; Proceedings of the Sigcomm ’88 Symposium, 1988
    49. The capacity of wireless networks. IEEE Trans. on Information Theory, 2000
    50. The PageRank citation ranking: Bringing order to the Web. 1998
    51. Support vector networks. Machine Learning 20, 1995
    52. R-Trees: A dynamic index structure for spatial searching. In Proceedings of the 1984 ACM-SIGMOD Conference, 1984
    53. Random Early Detection Gateways for Congestion avoidance. IEEE/ACM Trans. Network, Vol, 1993
    54. The evolution of cooperation. 1984
    55. STATECHARTS: A Visual Formalism for Complex Systems. Science of Computer Programming, 1987
    56. The Anatomy of the Grid: Enabling Scalable Virtual Organizations. International Journal of Supercomputer Applications, 2001
    57. Rapid object detection using a boosted cascade of simple features. in Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR ’01, 2001
    58. Marching Cubes: A high resolution 3D surface construction algorithm. Computer Graphics (SIGGRAPH ’87 Proceedings, 1987
    59. Learning with kernels. 2002
    60. Building a Large Annotated Corpus of English: The Penn Treebank. Computational Linguistics, 1993
    61. Economic Growth. 1995
    62. Noncommutative Geometry. 1994
    63. A Theory of Timed Automata. Theoretical Computer Science, 1994
    64. The moderator-mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology, 1986
    65. An Iterative Image Registration Technique with an Application to Stereo Vision. Proceedings of Image Understanding Workshop, 1981
    66. Data Mining: Concepts and techniques. 2001
    67. RTP: A Transport Protocol for Real-Time Applications. Internet RFC 1889, Internet Engineering Task Force (IETF, 1996
    68. Regression shrinkage and selection via the lasso. J. R. Statist. Soc. B, 1996
    69. Some methods for classification and analysis of multivariate observations. In Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, LM LE Cam & J Neyman (eds.), Univeristy of California, 1967
    70. Capacity of multi-antenna Gaussian channels. European Trans. Telecommun, 1999
    71. On Random Graphs. I. Publ. Math. Debrecen, 1959
    72. MapReduce: Simplified data processing on large clusters. In: Proceedings of Operating Systems Design and Implementation. MapReduce for Machine Learning on Multicore, In: proceedings of Advances in Neural Information Processing Systems. NIPS 19, 306-313. Mahout project, [online]. http://lucene.apache.org/mahout, 2004
    73. Directed Diffusion: A Scalable and Robust Communication Paradigm for Sensor Networks. In Proceedings of the Sixth Annual ACM/IEEE International Conference on Mobile Computing and Networking (Mobicom 2000, 2000
    74. On the self-similar nature of Ethernet traffic (extended version. IEEE/ACM Transaction on Networking, 1994
    75. How to share a secret. Communications of the ACM, 1979
    76. Particle swarm optimization. Proceedings of the 1995 IEEE International Conference on Neural Networks (Perth, Australia), IEEE Service Center, Piscataway, NJ, IV, 1995
    77. Economic action and social structure : the problem of embeddedness. American Journal of Sociology, Volume, 1985
    78. Randomized algorithms. 1995
    79. Random Graphs. 1985
    80. A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity. Econometrica, 1980\
    81. A Course in Game Theory. 1994
    82. Histograms of oriented gradients for human detection. International Conference on Computer Vision and Pattern Recognition, 2005
    83. Metaphors we live by. 1980
    84. Compressive sensing. IEEE Trans. on Information Theory, 0
    85. The semantic web. Scientific American, 2001
    86. Determining optical flow. Artificial Intelligence, 1981
    87. Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In Proceedings of the Tenth European Conference on Machine Learning, 1998
    88. A combined corner and edge detector. Alvey Vision Conference, 1988
    89. Pastry: Scalable, Distributed Object Location and Routing for Large-Scale Peer-to-Peer Systems. In Proceedings of the 18th IFIP/ACM International Conference on Distributed Systems Platforms (Middleware 2001, 2001
    90. Endogenous Technological Change. Journal of Political Economy, 1990
    91. A generalized processor sharing approach to flow control - the single node case. IEEE/ACM Trans. on Networking, 1993
    92. Atomic decomposition by basis pursuit. SIAM J. Sci. Comp, 1998
    93. An Introduction to Kolmogorov Complexity and Its Applications. 1993
    94. Social capital in the creation of human capital. American Journal of Sociology, 1988
    95. Optimality Theory: Constraint interaction in generative grammar. 1993
    96. Gpsr: Greedy perimeter stateless routing for wireless networks. In Proceedings of MOBICOM, 0
    97. Experiments with a new boosting algorithm. In International Conference on Machine Learning, 1996
    98. Orthonormal bases of compactly supported wavelets[J. Communications on Pure and Applied Mathematics, 1988
    99. On limits of wireless communications in a fading environment when using mutiple antennas. Wireless Personal Communications, 1998
    100. A theory of the learnable. Communications of the Association for Computing Machinery, 1984
  • From Elsevier (Source III)
    1. Certifying algorithms
    2. Traditional and recent approaches in background modeling for foreground detection: An overview
    3. Linear Temporal Logic Symbolic Model Checkin
    4. Contextual music information retrieval and recommendation: State of the art and challenges
    5. Motion planning algorithms for molecular simulations: A survey
    6. Taxonomy of attacks and defense mechanisms in P2P reputation systems-Lessons for reputation system designers
    7. The renaming problem in shared memory systems: An introduction
    8. Conjunctive and boolean grammars: The true general case of the context-free grammars
    9. Data mining of social networks represented as graphs
    10. A survey on Security Issues of Reputation Management Systems for Peer-to-Peer Networks
    11. Verification conditions for source-level imperative programs
    12. Textual data compression in computational biology: Algorithmic techniques
    13. Service quality in P2P streaming systems
    14. A survey of timed automata for the development of real-time systems
    15. A survey on relay placement with runtime and approximation guarantees
    16. Computational models for networks of tiny artifacts: A survey
    17. DAG-based attack and defense modeling: Don't miss the forest for the attack trees
    18. Growth properties of power-free languages
    19. Confronting intractability via parameters
    20. Which security policies are enforceable by runtime monitors? A survey
    21. Current status and key issues in image steganography: A survey
    22. Urban pervasive applications: Challenges, scenarios and case studies
    23. Distributed algorithm engineering for networks of tiny artifacts
    24. Streaming techniques and data aggregation in networks of tiny artefacts
    25. A survey on tree matching and XML retrieval
That's the list of influential paper that I want to read. I don't know if I can finish this project but one thing that I'm sure of, I will learn a lot from this project. 

While the list itself is intimidating, I believe I will meet a lot of challenges in the process working on this project like understanding the paper or even just getting the paper. If in any case you read this page and find yourself have any paper that I in my reading list, please leave a comment where I can download it, or you can email directly to anang[dot]dista[dot]satria[at]gmail.com. Any help for the success of this project really appreciated. Thank you :)

UPDATE: I created this blog to work on this project.
Most Influential and Most Cited Works in CS Project


No comments:

Post a Comment

Finally, C# 9 record, the equivalent of Scala's case class

While C# is a wonderful programming language, there is something that I would like to see to make our life programmer easier. If you are fam...